Skip to content

Spectral Blocks

spectrans.blocks.spectral

Spectral transformer blocks.

This module provides pre-configured transformer blocks for different spectral architectures, combining various mixing and attention layers with feedforward networks.

Classes:

Name Description
FNetBlock

Transformer block using Fourier mixing (FNet architecture).

GFNetBlock

Transformer block using global filter mixing.

AFNOBlock

Transformer block using adaptive Fourier neural operator.

SpectralAttentionBlock

Transformer block using spectral attention with RFF.

LSTBlock

Transformer block using linear spectral transform attention.

WaveletBlock

Transformer block using wavelet mixing.

FNOBlock

Transformer block using Fourier neural operator layers.

Examples:

Creating different spectral blocks:

>>> from spectrans.blocks.spectral import FNetBlock, GFNetBlock
>>> fnet_block = FNetBlock(hidden_dim=768, dropout=0.1)
>>> gfnet_block = GFNetBlock(hidden_dim=768, sequence_length=512)
Notes

Each spectral block implements different mixing strategies: - FNetBlock: Uses FFT for token mixing with \(O(n \log n)\) complexity - GFNetBlock: Applies learnable filters in frequency domain - AFNOBlock: Selects Fourier modes adaptively - SpectralAttentionBlock: Approximates kernels using random features - LSTBlock: Uses orthogonal transforms (DCT, DST, or Hadamard) - WaveletBlock: Performs multi-resolution decomposition - FNOBlock: Implements neural operators in frequency domain

References

James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. 2022. FNet: Mixing tokens with Fourier transforms. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4296-4313, Seattle.

Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. 2021. Global filter networks for image classification. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 980-993.

John Guibas, Morteza Mardani, Zongyi Li, Andrew Tao, Anima Anandkumar, and Bryan Catanzaro. 2022. Adaptive Fourier neural operators: Efficient token mixers for transformers. In Proceedings of the International Conference on Learning Representations (ICLR).

Classes

FNetBlock

FNetBlock(hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

FNet transformer block with Fourier mixing.

This block uses Fourier transforms for token mixing, providing an alternative to attention with \(O(n \log n)\) complexity.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension of the model.

required
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = FourierMixing(hidden_dim=hidden_dim)
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

GFNetBlock

GFNetBlock(hidden_dim: int, sequence_length: int, ffn_hidden_dim: int | None = None, filter_activation: str = 'sigmoid', filter_init_std: float = 0.02, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

GFNet transformer block with global filter mixing.

This block uses learnable frequency-domain filters for token mixing.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension of the model.

required
sequence_length int

Maximum sequence length.

required
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
filter_activation str

Activation for filters ('sigmoid', 'tanh', or 'identity'). Default is 'sigmoid'.

'sigmoid'
filter_init_std float

Initialization std for filters. Default is 0.02.

0.02
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    sequence_length: int,
    ffn_hidden_dim: int | None = None,
    filter_activation: str = "sigmoid",
    filter_init_std: float = 0.02,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = GlobalFilterMixing(
        hidden_dim=hidden_dim,
        sequence_length=sequence_length,
        activation=cast(ActivationType, filter_activation),
        filter_init_std=filter_init_std,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

AFNOBlock

AFNOBlock(hidden_dim: int, sequence_length: int, modes: int | None = None, mlp_hidden_dim: int | None = None, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

AFNO transformer block with adaptive Fourier neural operator.

This block uses adaptive Fourier mode selection with MLPs in the frequency domain for token mixing.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension of the model.

required
sequence_length int

Maximum sequence length.

required
modes int | None

Number of Fourier modes to retain. Default is sequence_length // 2.

None
mlp_hidden_dim int | None

Hidden dimension of the frequency-domain MLP. Default is hidden_dim.

None
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    sequence_length: int,
    modes: int | None = None,
    mlp_hidden_dim: int | None = None,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    # Determine mlp_ratio from mlp_hidden_dim if provided
    mlp_ratio = mlp_hidden_dim / hidden_dim if mlp_hidden_dim is not None else 2.0

    mixing_layer = AFNOMixing(
        hidden_dim=hidden_dim,
        max_sequence_length=sequence_length,
        modes_seq=modes,
        modes_hidden=modes,
        mlp_ratio=mlp_ratio,
        activation=cast(ActivationType, activation),
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

SpectralAttentionBlock

SpectralAttentionBlock(hidden_dim: int, num_heads: int = 8, num_features: int | None = None, kernel_type: str = 'gaussian', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

Spectral attention transformer block.

This block uses spectral attention with random Fourier features for kernel approximation.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension of the model.

required
num_heads int

Number of attention heads. Default is 8.

8
num_features int | None

Number of random features. Default is 256.

None
kernel_type str

Type of kernel ('gaussian' or 'laplacian'). Default is 'gaussian'.

'gaussian'
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    num_heads: int = 8,
    num_features: int | None = None,
    kernel_type: str = "gaussian",
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = SpectralAttention(
        hidden_dim=hidden_dim,
        num_heads=num_heads,
        num_features=num_features,
        kernel_type=cast(KernelType, kernel_type),
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

LSTBlock

LSTBlock(hidden_dim: int, num_heads: int = 8, transform_type: str = 'dct', use_scaling: bool = True, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

LST transformer block with linear spectral transform attention.

This block uses orthogonal transforms (DCT, DST, or Hadamard) for attention computation.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension of the model.

required
num_heads int

Number of attention heads. Default is 8.

8
transform_type str

Type of transform ('dct', 'dst', or 'hadamard'). Default is 'dct'.

'dct'
use_scaling bool

Whether to use learnable scaling. Default is True.

True
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    num_heads: int = 8,
    transform_type: str = "dct",
    use_scaling: bool = True,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = LSTAttention(
        hidden_dim=hidden_dim,
        num_heads=num_heads,
        transform_type=cast(TransformLSTType, transform_type),
        learnable_scale=use_scaling,
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

WaveletBlock

WaveletBlock(hidden_dim: int, wavelet: str = 'db4', levels: int = 3, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

Wavelet transformer block with wavelet mixing.

This block uses discrete wavelet transforms for multiscale token mixing.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension of the model.

required
wavelet str

Type of wavelet. Default is 'db4'.

'db4'
levels int

Number of decomposition levels. Default is 3.

3
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    wavelet: str = "db4",
    levels: int = 3,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = WaveletMixing(
        hidden_dim=hidden_dim,
        wavelet=cast(WaveletType, wavelet),
        levels=levels,
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

FNOBlock

FNOBlock(hidden_dim: int, modes: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

FNO transformer block with Fourier neural operator.

This block uses Fourier neural operators for learning mappings between function spaces.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension of the model.

required
modes int | None

Number of Fourier modes. Default is 16.

None
num_layers int

Number of FNO layers. Default is 1.

1
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    modes: int | None = None,
    num_layers: int = 1,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    if modes is None:
        modes = 16

    # Use FourierNeuralOperator for mixing
    if num_layers == 1:
        mixing_layer: nn.Module = FourierNeuralOperator(
            hidden_dim=hidden_dim,
            modes=modes,
            activation=cast(ActivationType, activation),
        )
    else:
        # Stack multiple FNO layers
        layers = []
        for _ in range(num_layers):
            layers.append(
                FourierNeuralOperator(
                    hidden_dim=hidden_dim,
                    modes=modes,
                    activation=cast(ActivationType, activation),
                )
            )
        mixing_layer = nn.Sequential(*layers)

    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

FNO2DBlock

FNO2DBlock(hidden_dim: int, modes_h: int | None = None, modes_w: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

2D FNO transformer block for image or grid data.

This block uses 2D Fourier neural operators for spatial data processing.

Parameters:

Name Type Description Default
hidden_dim int

Hidden dimension (number of channels).

required
modes_h int | None

Number of Fourier modes for height. Default is 16.

None
modes_w int | None

Number of Fourier modes for width. Default is 16.

None
num_layers int

Number of FNO layers. Default is 1.

1
ffn_hidden_dim int | None

Hidden dimension of the FFN. Default is 4 * hidden_dim.

None
activation str

Activation function. Default is 'gelu'.

'gelu'
dropout float

Dropout probability. Default is 0.0.

0.0
norm_eps float

Epsilon for layer normalization. Default is 1e-12.

1e-12
Source code in spectrans/blocks/spectral.py
def __init__(
    self,
    hidden_dim: int,
    modes_h: int | None = None,
    modes_w: int | None = None,
    num_layers: int = 1,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    if modes_h is None:
        modes_h = 16
    if modes_w is None:
        modes_w = 16

    # For 2D, we use FourierNeuralOperator with 2D mode specification
    modes_2d = (modes_h, modes_w)
    if num_layers == 1:
        mixing_layer: nn.Module = FourierNeuralOperator(
            hidden_dim=hidden_dim,
            modes=modes_2d,  # Use 2D mode tuple
            activation=cast(ActivationType, activation),
        )
    else:
        # Stack multiple FNO layers
        layers = []
        for _ in range(num_layers):
            layers.append(
                FourierNeuralOperator(
                    hidden_dim=hidden_dim,
                    modes=modes_2d,  # Use 2D mode tuple
                    activation=cast(ActivationType, activation),
                )
            )
        mixing_layer = nn.Sequential(*layers)

    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

Functions