Spectral Blocks¶
spectrans.blocks.spectral ¶
Spectral transformer blocks.
This module provides pre-configured transformer blocks for different spectral architectures, combining various mixing and attention layers with feedforward networks.
Classes:
| Name | Description |
|---|---|
FNetBlock |
Transformer block using Fourier mixing (FNet architecture). |
GFNetBlock |
Transformer block using global filter mixing. |
AFNOBlock |
Transformer block using adaptive Fourier neural operator. |
SpectralAttentionBlock |
Transformer block using spectral attention with RFF. |
LSTBlock |
Transformer block using linear spectral transform attention. |
WaveletBlock |
Transformer block using wavelet mixing. |
FNOBlock |
Transformer block using Fourier neural operator layers. |
Examples:
Creating different spectral blocks:
>>> from spectrans.blocks.spectral import FNetBlock, GFNetBlock
>>> fnet_block = FNetBlock(hidden_dim=768, dropout=0.1)
>>> gfnet_block = GFNetBlock(hidden_dim=768, sequence_length=512)
Notes
Each spectral block implements different mixing strategies: - FNetBlock: Uses FFT for token mixing with \(O(n \log n)\) complexity - GFNetBlock: Applies learnable filters in frequency domain - AFNOBlock: Selects Fourier modes adaptively - SpectralAttentionBlock: Approximates kernels using random features - LSTBlock: Uses orthogonal transforms (DCT, DST, or Hadamard) - WaveletBlock: Performs multi-resolution decomposition - FNOBlock: Implements neural operators in frequency domain
References
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. 2022. FNet: Mixing tokens with Fourier transforms. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4296-4313, Seattle.
Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. 2021. Global filter networks for image classification. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 980-993.
John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, and Bryan Catanzaro. 2022. Adaptive Fourier neural operators: Efficient token mixers for transformers. In Proceedings of the International Conference on Learning Representations (ICLR).
Classes¶
FNetBlock ¶
FNetBlock(hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
FNet transformer block with Fourier mixing.
This block uses Fourier transforms for token mixing, providing an alternative to attention with \(O(n \log n)\) complexity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
GFNetBlock ¶
GFNetBlock(hidden_dim: int, sequence_length: int, ffn_hidden_dim: int | None = None, filter_activation: str = 'sigmoid', filter_init_std: float = 0.02, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
GFNet transformer block with global filter mixing.
This block uses learnable frequency-domain filters for token mixing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
sequence_length
|
int
|
Maximum sequence length. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
filter_activation
|
str
|
Activation for filters ('sigmoid', 'tanh', or 'identity'). Default is 'sigmoid'. |
'sigmoid'
|
filter_init_std
|
float
|
Initialization std for filters. Default is 0.02. |
0.02
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
AFNOBlock ¶
AFNOBlock(hidden_dim: int, sequence_length: int, modes: int | None = None, mlp_hidden_dim: int | None = None, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
AFNO transformer block with adaptive Fourier neural operator.
This block uses adaptive Fourier mode selection with MLPs in the frequency domain for token mixing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
sequence_length
|
int
|
Maximum sequence length. |
required |
modes
|
int | None
|
Number of Fourier modes to retain. Default is sequence_length // 2. |
None
|
mlp_hidden_dim
|
int | None
|
Hidden dimension of the frequency-domain MLP. Default is hidden_dim. |
None
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
SpectralAttentionBlock ¶
SpectralAttentionBlock(hidden_dim: int, num_heads: int = 8, num_features: int | None = None, kernel_type: str = 'gaussian', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
Spectral attention transformer block.
This block uses spectral attention with random Fourier features for kernel approximation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
num_heads
|
int
|
Number of attention heads. Default is 8. |
8
|
num_features
|
int | None
|
Number of random features. Default is 256. |
None
|
kernel_type
|
str
|
Type of kernel ('gaussian' or 'laplacian'). Default is 'gaussian'. |
'gaussian'
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
LSTBlock ¶
LSTBlock(hidden_dim: int, num_heads: int = 8, transform_type: str = 'dct', use_scaling: bool = True, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
LST transformer block with linear spectral transform attention.
This block uses orthogonal transforms (DCT, DST, or Hadamard) for attention computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
num_heads
|
int
|
Number of attention heads. Default is 8. |
8
|
transform_type
|
str
|
Type of transform ('dct', 'dst', or 'hadamard'). Default is 'dct'. |
'dct'
|
use_scaling
|
bool
|
Whether to use learnable scaling. Default is True. |
True
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
WaveletBlock ¶
WaveletBlock(hidden_dim: int, wavelet: str = 'db4', levels: int = 3, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
Wavelet transformer block with wavelet mixing.
This block uses discrete wavelet transforms for multiscale token mixing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
wavelet
|
str
|
Type of wavelet. Default is 'db4'. |
'db4'
|
levels
|
int
|
Number of decomposition levels. Default is 3. |
3
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
FNOBlock ¶
FNOBlock(hidden_dim: int, modes: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
FNO transformer block with Fourier neural operator.
This block uses Fourier neural operators for learning mappings between function spaces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
modes
|
int | None
|
Number of Fourier modes. Default is 16. |
None
|
num_layers
|
int
|
Number of FNO layers. Default is 1. |
1
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
FNO2DBlock ¶
FNO2DBlock(hidden_dim: int, modes_h: int | None = None, modes_w: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
2D FNO transformer block for image or grid data.
This block uses 2D Fourier neural operators for spatial data processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension (number of channels). |
required |
modes_h
|
int | None
|
Number of Fourier modes for height. Default is 16. |
None
|
modes_w
|
int | None
|
Number of Fourier modes for width. Default is 16. |
None
|
num_layers
|
int
|
Number of FNO layers. Default is 1. |
1
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|