Hybrid Blocks¶
spectrans.blocks.hybrid ¶
Hybrid spatial-spectral transformer blocks.
This module provides transformer blocks that combine different types of mixing layers, alternating between spectral and spatial processing or using adaptive selection mechanisms.
Classes:
| Name | Description |
|---|---|
HybridBlock |
Base class for hybrid transformer blocks. |
AlternatingBlock |
Block that alternates between two different mixing layers. |
AdaptiveBlock |
Block that adaptively selects between mixing strategies. |
MultiscaleBlock |
Block that processes multiple scales in parallel. |
CascadeBlock |
Block that cascades multiple mixing strategies sequentially. |
Examples:
Creating hybrid blocks with different strategies:
>>> from spectrans.blocks.hybrid import AlternatingBlock
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>> from spectrans.layers.attention.spectral import SpectralAttention
>>> block = AlternatingBlock(
... layer1=FourierMixing(hidden_dim=768),
... layer2=SpectralAttention(hidden_dim=768, num_heads=8),
... hidden_dim=768
... )
Notes
Hybrid blocks combine multiple mixing strategies through: - Alternating selection between different layer types - Adaptive gating mechanisms for dynamic layer selection - Parallel processing at multiple scales - Sequential cascading of different transformations
References
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR).
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012-10022.
Classes¶
HybridBlock ¶
HybridBlock(hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: SpectralComponent
Base class for hybrid transformer blocks.
This class provides the foundation for blocks that combine multiple mixing strategies in various ways.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
hidden_dim |
int
|
Hidden dimension of the model. |
ffn |
FeedForwardNetwork | None
|
The feedforward network. |
dropout |
Dropout
|
Dropout layer. |
Source code in spectrans/blocks/hybrid.py
AlternatingBlock ¶
AlternatingBlock(layer1: MixingLayer | Module, layer2: MixingLayer | Module, hidden_dim: int, use_layer1: bool = True, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that alternates between two mixing strategies.
This block can be used in alternating patterns, e.g., even layers use one type of mixing and odd layers use another.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer1
|
MixingLayer | Module
|
First mixing layer. |
required |
layer2
|
MixingLayer | Module
|
Second mixing layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
use_layer1
|
bool
|
Whether to use layer1 (True) or layer2 (False). Default is True. |
True
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layer1 |
MixingLayer | Module
|
First mixing layer. |
layer2 |
MixingLayer | Module
|
Second mixing layer. |
use_layer1 |
bool
|
Which layer to use for this block. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the alternating block. |
set_layer |
Set which layer to use. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the alternating block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/hybrid.py
set_layer ¶
Set which layer to use.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
use_layer1
|
bool
|
Whether to use layer1 (True) or layer2 (False). |
required |
AdaptiveBlock ¶
AdaptiveBlock(layers: list[MixingLayer | Module], hidden_dim: int, gate_type: str = 'soft', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that adaptively selects between mixing strategies.
This block uses a gating mechanism to dynamically choose or blend between different mixing strategies based on the input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layers
|
list[MixingLayer | Module]
|
List of mixing layers to choose from. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
gate_type
|
str
|
Type of gating ('soft' for weighted sum, 'hard' for selection). Default is 'soft'. |
'soft'
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layers |
ModuleList
|
List of mixing layers. |
gate |
Linear
|
Gating network for layer selection. |
gate_type |
str
|
Type of gating mechanism. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the adaptive block. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the adaptive block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/hybrid.py
MultiscaleBlock ¶
MultiscaleBlock(layers: list[MixingLayer | Module], hidden_dim: int, fusion_type: str = 'add', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that processes multiple scales in parallel.
This block applies different mixing strategies at different scales and combines their outputs, capturing both local and global patterns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layers
|
list[MixingLayer | Module]
|
List of mixing layers for different scales. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
fusion_type
|
str
|
How to fuse outputs ('concat', 'add', 'weighted'). Default is 'add'. |
'add'
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layers |
ModuleList
|
List of mixing layers for different scales. |
fusion_type |
str
|
Type of fusion mechanism. |
fusion_weights |
Parameter | None
|
Learnable weights for fusion (if fusion_type is 'weighted'). |
fusion_proj |
Linear | None
|
Projection for concatenation fusion. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the multiscale block. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the multiscale block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/hybrid.py
CascadeBlock ¶
CascadeBlock(layers: list[MixingLayer | Module], hidden_dim: int, share_norm: bool = False, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that cascades multiple mixing strategies.
This block applies mixing layers sequentially, allowing each layer to refine the representations produced by the previous one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layers
|
list[MixingLayer | Module]
|
List of mixing layers to cascade. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
share_norm
|
bool
|
Whether to share normalization across layers. Default is False. |
False
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layers |
ModuleList
|
List of mixing layers to cascade. |
norms |
ModuleList
|
Normalization layers for each mixing layer. |
share_norm |
bool
|
Whether normalization is shared. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the cascade block. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the cascade block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |