Transformer Blocks¶
spectrans.blocks ¶
Transformer block implementations for spectral architectures.
This module provides transformer blocks that combine spectral mixing or attention layers with feedforward networks, residual connections, and normalization. The blocks implement different architectural patterns including pre-norm, post-norm, parallel, and hybrid configurations for various spectral transformer models.
Modules:
| Name | Description |
|---|---|
base |
Base classes for transformer blocks. |
hybrid |
Hybrid blocks combining multiple mixing strategies. |
spectral |
Spectral transformer blocks using frequency-domain methods. |
Classes:
| Name | Description |
|---|---|
AFNOBlock |
Adaptive Fourier Neural Operator block with mode truncation. |
AdaptiveBlock |
Block with adaptive routing between components. |
AlternatingBlock |
Alternates between different mixing strategies. |
CascadeBlock |
Cascades multiple blocks with different configurations. |
FeedForwardNetwork |
Standard MLP feedforward network. |
FNetBlock |
FNet-style block with Fourier mixing. |
FNO2DBlock |
2D Fourier Neural Operator block for spatial data. |
FNOBlock |
1D Fourier Neural Operator block. |
GFNetBlock |
Global Filter Network block with learnable filters. |
HybridBlock |
Combines multiple mixing strategies in parallel. |
LSTBlock |
Linear Spectral Transform block. |
MultiscaleBlock |
Multi-resolution processing with wavelets. |
ParallelBlock |
Parallel execution of mixing and feedforward. |
PostNormBlock |
Post-normalization transformer block. |
PreNormBlock |
Pre-normalization transformer block. |
SpectralAttentionBlock |
Block using spectral attention mechanisms. |
TransformerBlock |
Base class for all transformer blocks. |
WaveletBlock |
Block using wavelet transforms for mixing. |
Examples:
Using a FNet block:
>>> import torch
>>> from spectrans.blocks import FNetBlock
>>>
>>> block = FNetBlock(hidden_dim=768, ffn_hidden_dim=3072)
>>> x = torch.randn(32, 512, 768)
>>> output = block(x)
>>> assert output.shape == x.shape
Using a hybrid block with multiple mixing strategies:
>>> from spectrans.blocks import AlternatingBlock
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>> from spectrans.layers.mixing.wavelet import WaveletMixing
>>>
>>> layer1 = FourierMixing(hidden_dim=512)
>>> layer2 = WaveletMixing(hidden_dim=512, wavelet='db4')
>>> block = AlternatingBlock(layer1=layer1, layer2=layer2, hidden_dim=512)
>>> output = block(x)
Using parallel execution:
>>> from spectrans.blocks import ParallelBlock
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>>
>>> mixing = FourierMixing(hidden_dim=768)
>>> block = ParallelBlock(mixing_layer=mixing, hidden_dim=768)
>>> output = block(x)
Notes
Architectural Patterns:
- Pre-Norm: LayerNorm → Mixing → Residual → LayerNorm → FFN → Residual
- Post-Norm: Mixing → Residual → LayerNorm → FFN → Residual → LayerNorm
- Parallel: Mixing and FFN execute simultaneously with single residual
- Hybrid: Multiple mixing strategies combined with learnable or fixed weights
Complexity Comparison:
- Standard Transformer: \(O(n^2 d)\) per block
- FNet Block: \(O(nd \log n)\) per block
- GFNet Block: \(O(nd \log n)\) with learnable parameters
- Wavelet Block: \(O(nd)\) with multi-resolution analysis
- Hybrid Block: Weighted combination of component complexities
All blocks maintain: - Residual connections for gradient flow - LayerNorm for training stability - Dropout for regularization - Optional activation checkpointing
References
James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. 2022. FNet: Mixing tokens with Fourier transforms. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4296-4313, Seattle.
Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. 2021. Global filter networks for image classification. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 980-993.
See Also
spectrans.layers : Layer implementations used in blocks.
spectrans.models : Models built from these blocks.
spectrans.blocks.base : Base classes and interfaces.
Classes¶
FeedForwardNetwork ¶
FeedForwardNetwork(hidden_dim: int, ffn_hidden_dim: int, activation: str = 'gelu', dropout: float = 0.0)
Bases: Module
Standard feedforward network for transformer blocks.
A two-layer MLP with configurable activation function and dropout.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Input and output dimension. |
required |
ffn_hidden_dim
|
int
|
Hidden dimension of the FFN. |
required |
activation
|
str
|
Activation function name. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
Attributes:
| Name | Type | Description |
|---|---|---|
fc1 |
Linear
|
First linear layer. |
fc2 |
Linear
|
Second linear layer. |
activation |
Module
|
Activation function. |
dropout |
Dropout
|
Dropout layer. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the FFN. |
Source code in spectrans/blocks/base.py
Functions¶
forward ¶
Forward pass through the FFN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (..., hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (..., hidden_dim). |
Source code in spectrans/blocks/base.py
ParallelBlock ¶
ParallelBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: SpectralComponent
Transformer block with parallel mixing and FFN branches.
This block processes the mixing layer and FFN in parallel rather than sequentially, which can improve efficiency and has been shown to work well in practice.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
mixing_layer |
MixingLayer | Module
|
The mixing or attention layer. |
ffn |
FeedForwardNetwork
|
The feedforward network. |
norm |
LayerNorm
|
Layer normalization. |
dropout |
Dropout
|
Dropout layer. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the parallel block. |
Source code in spectrans/blocks/base.py
Functions¶
forward ¶
Forward pass through the parallel block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/base.py
PostNormBlock ¶
PostNormBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: TransformerBlock
Transformer block with post-layer normalization.
This block applies layer normalization after the mixing layer and FFN, following the original transformer architecture.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/base.py
PreNormBlock ¶
PreNormBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: TransformerBlock
Transformer block with pre-layer normalization.
This block applies layer normalization before the mixing layer and FFN, which has been shown to improve training stability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/base.py
TransformerBlock ¶
TransformerBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, use_pre_norm: bool = True, norm_eps: float = 1e-12)
Bases: SpectralComponent
Base class for transformer blocks.
A transformer block combines a mixing/attention layer with a feedforward network, using residual connections and layer normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer for token interaction. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the feedforward network. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function for the FFN. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
use_pre_norm
|
bool
|
Whether to use pre-layer normalization. Default is True. |
True
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
mixing_layer |
MixingLayer | Module
|
The mixing or attention layer. |
ffn |
FeedForwardNetwork | None
|
The feedforward network. |
norm1 |
LayerNorm
|
First layer normalization. |
norm2 |
LayerNorm | None
|
Second layer normalization (if FFN is used). |
dropout |
Dropout
|
Dropout layer. |
use_pre_norm |
bool
|
Whether pre-normalization is used. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the transformer block. |
Source code in spectrans/blocks/base.py
Functions¶
forward ¶
Forward pass through the transformer block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/base.py
AdaptiveBlock ¶
AdaptiveBlock(layers: list[MixingLayer | Module], hidden_dim: int, gate_type: str = 'soft', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that adaptively selects between mixing strategies.
This block uses a gating mechanism to dynamically choose or blend between different mixing strategies based on the input.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layers
|
list[MixingLayer | Module]
|
List of mixing layers to choose from. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
gate_type
|
str
|
Type of gating ('soft' for weighted sum, 'hard' for selection). Default is 'soft'. |
'soft'
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layers |
ModuleList
|
List of mixing layers. |
gate |
Linear
|
Gating network for layer selection. |
gate_type |
str
|
Type of gating mechanism. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the adaptive block. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the adaptive block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/hybrid.py
AlternatingBlock ¶
AlternatingBlock(layer1: MixingLayer | Module, layer2: MixingLayer | Module, hidden_dim: int, use_layer1: bool = True, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that alternates between two mixing strategies.
This block can be used in alternating patterns, e.g., even layers use one type of mixing and odd layers use another.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layer1
|
MixingLayer | Module
|
First mixing layer. |
required |
layer2
|
MixingLayer | Module
|
Second mixing layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
use_layer1
|
bool
|
Whether to use layer1 (True) or layer2 (False). Default is True. |
True
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layer1 |
MixingLayer | Module
|
First mixing layer. |
layer2 |
MixingLayer | Module
|
Second mixing layer. |
use_layer1 |
bool
|
Which layer to use for this block. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the alternating block. |
set_layer |
Set which layer to use. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the alternating block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/hybrid.py
set_layer ¶
Set which layer to use.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
use_layer1
|
bool
|
Whether to use layer1 (True) or layer2 (False). |
required |
CascadeBlock ¶
CascadeBlock(layers: list[MixingLayer | Module], hidden_dim: int, share_norm: bool = False, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that cascades multiple mixing strategies.
This block applies mixing layers sequentially, allowing each layer to refine the representations produced by the previous one.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layers
|
list[MixingLayer | Module]
|
List of mixing layers to cascade. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
share_norm
|
bool
|
Whether to share normalization across layers. Default is False. |
False
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layers |
ModuleList
|
List of mixing layers to cascade. |
norms |
ModuleList
|
Normalization layers for each mixing layer. |
share_norm |
bool
|
Whether normalization is shared. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the cascade block. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the cascade block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/hybrid.py
HybridBlock ¶
HybridBlock(hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: SpectralComponent
Base class for hybrid transformer blocks.
This class provides the foundation for blocks that combine multiple mixing strategies in various ways.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
hidden_dim |
int
|
Hidden dimension of the model. |
ffn |
FeedForwardNetwork | None
|
The feedforward network. |
dropout |
Dropout
|
Dropout layer. |
Source code in spectrans/blocks/hybrid.py
MultiscaleBlock ¶
MultiscaleBlock(layers: list[MixingLayer | Module], hidden_dim: int, fusion_type: str = 'add', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: HybridBlock
Transformer block that processes multiple scales in parallel.
This block applies different mixing strategies at different scales and combines their outputs, capturing both local and global patterns.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
layers
|
list[MixingLayer | Module]
|
List of mixing layers for different scales. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
fusion_type
|
str
|
How to fuse outputs ('concat', 'add', 'weighted'). Default is 'add'. |
'add'
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
layers |
ModuleList
|
List of mixing layers for different scales. |
fusion_type |
str
|
Type of fusion mechanism. |
fusion_weights |
Parameter | None
|
Learnable weights for fusion (if fusion_type is 'weighted'). |
fusion_proj |
Linear | None
|
Projection for concatenation fusion. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the multiscale block. |
Source code in spectrans/blocks/hybrid.py
Functions¶
forward ¶
Forward pass through the multiscale block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/hybrid.py
AFNOBlock ¶
AFNOBlock(hidden_dim: int, sequence_length: int, modes: int | None = None, mlp_hidden_dim: int | None = None, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
AFNO transformer block with adaptive Fourier neural operator.
This block uses adaptive Fourier mode selection with MLPs in the frequency domain for token mixing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
sequence_length
|
int
|
Maximum sequence length. |
required |
modes
|
int | None
|
Number of Fourier modes to retain. Default is sequence_length // 2. |
None
|
mlp_hidden_dim
|
int | None
|
Hidden dimension of the frequency-domain MLP. Default is hidden_dim. |
None
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
FNetBlock ¶
FNetBlock(hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
FNet transformer block with Fourier mixing.
This block uses Fourier transforms for token mixing, providing an alternative to attention with \(O(n \log n)\) complexity.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
FNO2DBlock ¶
FNO2DBlock(hidden_dim: int, modes_h: int | None = None, modes_w: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
2D FNO transformer block for image or grid data.
This block uses 2D Fourier neural operators for spatial data processing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension (number of channels). |
required |
modes_h
|
int | None
|
Number of Fourier modes for height. Default is 16. |
None
|
modes_w
|
int | None
|
Number of Fourier modes for width. Default is 16. |
None
|
num_layers
|
int
|
Number of FNO layers. Default is 1. |
1
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
FNOBlock ¶
FNOBlock(hidden_dim: int, modes: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
FNO transformer block with Fourier neural operator.
This block uses Fourier neural operators for learning mappings between function spaces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
modes
|
int | None
|
Number of Fourier modes. Default is 16. |
None
|
num_layers
|
int
|
Number of FNO layers. Default is 1. |
1
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
GFNetBlock ¶
GFNetBlock(hidden_dim: int, sequence_length: int, ffn_hidden_dim: int | None = None, filter_activation: str = 'sigmoid', filter_init_std: float = 0.02, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
GFNet transformer block with global filter mixing.
This block uses learnable frequency-domain filters for token mixing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
sequence_length
|
int
|
Maximum sequence length. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
filter_activation
|
str
|
Activation for filters ('sigmoid', 'tanh', or 'identity'). Default is 'sigmoid'. |
'sigmoid'
|
filter_init_std
|
float
|
Initialization std for filters. Default is 0.02. |
0.02
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
LSTBlock ¶
LSTBlock(hidden_dim: int, num_heads: int = 8, transform_type: str = 'dct', use_scaling: bool = True, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
LST transformer block with linear spectral transform attention.
This block uses orthogonal transforms (DCT, DST, or Hadamard) for attention computation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
num_heads
|
int
|
Number of attention heads. Default is 8. |
8
|
transform_type
|
str
|
Type of transform ('dct', 'dst', or 'hadamard'). Default is 'dct'. |
'dct'
|
use_scaling
|
bool
|
Whether to use learnable scaling. Default is True. |
True
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
SpectralAttentionBlock ¶
SpectralAttentionBlock(hidden_dim: int, num_heads: int = 8, num_features: int | None = None, kernel_type: str = 'gaussian', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
Spectral attention transformer block.
This block uses spectral attention with random Fourier features for kernel approximation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
num_heads
|
int
|
Number of attention heads. Default is 8. |
8
|
num_features
|
int | None
|
Number of random features. Default is 256. |
None
|
kernel_type
|
str
|
Type of kernel ('gaussian' or 'laplacian'). Default is 'gaussian'. |
'gaussian'
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/spectral.py
WaveletBlock ¶
WaveletBlock(hidden_dim: int, wavelet: str = 'db4', levels: int = 3, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: PreNormBlock
Wavelet transformer block with wavelet mixing.
This block uses discrete wavelet transforms for multiscale token mixing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
wavelet
|
str
|
Type of wavelet. Default is 'db4'. |
'db4'
|
levels
|
int
|
Number of decomposition levels. Default is 3. |
3
|
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|