Transformer Blocks¶

spectrans.blocks ¶

Transformer block implementations for spectral architectures.

This module provides transformer blocks that combine spectral mixing or attention layers with feedforward networks, residual connections, and normalization. The blocks implement different architectural patterns including pre-norm, post-norm, parallel, and hybrid configurations for various spectral transformer models.

Modules:

Name	Description
`base`	Base classes for transformer blocks.
`hybrid`	Hybrid blocks combining multiple mixing strategies.
`spectral`	Spectral transformer blocks using frequency-domain methods.

Classes:

Name	Description
`AFNOBlock`	Adaptive Fourier Neural Operator block with mode truncation.
`AdaptiveBlock`	Block with adaptive routing between components.
`AlternatingBlock`	Alternates between different mixing strategies.
`CascadeBlock`	Cascades multiple blocks with different configurations.
`FeedForwardNetwork`	Standard MLP feedforward network.
`FNetBlock`	FNet-style block with Fourier mixing.
`FNO2DBlock`	2D Fourier Neural Operator block for spatial data.
`FNOBlock`	1D Fourier Neural Operator block.
`GFNetBlock`	Global Filter Network block with learnable filters.
`HybridBlock`	Combines multiple mixing strategies in parallel.
`LSTBlock`	Linear Spectral Transform block.
`MultiscaleBlock`	Multi-resolution processing with wavelets.
`ParallelBlock`	Parallel execution of mixing and feedforward.
`PostNormBlock`	Post-normalization transformer block.
`PreNormBlock`	Pre-normalization transformer block.
`SpectralAttentionBlock`	Block using spectral attention mechanisms.
`TransformerBlock`	Base class for all transformer blocks.
`WaveletBlock`	Block using wavelet transforms for mixing.

Examples:

Using a FNet block:

>>> import torch
>>> from spectrans.blocks import FNetBlock
>>>
>>> block = FNetBlock(hidden_dim=768, ffn_hidden_dim=3072)
>>> x = torch.randn(32, 512, 768)
>>> output = block(x)
>>> assert output.shape == x.shape

Using a hybrid block with multiple mixing strategies:

>>> from spectrans.blocks import AlternatingBlock
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>> from spectrans.layers.mixing.wavelet import WaveletMixing
>>>
>>> layer1 = FourierMixing(hidden_dim=512)
>>> layer2 = WaveletMixing(hidden_dim=512, wavelet='db4')
>>> block = AlternatingBlock(layer1=layer1, layer2=layer2, hidden_dim=512)
>>> output = block(x)

Using parallel execution:

>>> from spectrans.blocks import ParallelBlock
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>>
>>> mixing = FourierMixing(hidden_dim=768)
>>> block = ParallelBlock(mixing_layer=mixing, hidden_dim=768)
>>> output = block(x)

Notes

Architectural Patterns:

Pre-Norm: LayerNorm → Mixing → Residual → LayerNorm → FFN → Residual
Post-Norm: Mixing → Residual → LayerNorm → FFN → Residual → LayerNorm
Parallel: Mixing and FFN execute simultaneously with single residual
Hybrid: Multiple mixing strategies combined with learnable or fixed weights

Complexity Comparison:

Standard Transformer: \(O(n^2 d)\) per block
FNet Block: \(O(nd \log n)\) per block
GFNet Block: \(O(nd \log n)\) with learnable parameters
Wavelet Block: \(O(nd)\) with multi-resolution analysis
Hybrid Block: Weighted combination of component complexities

All blocks maintain: - Residual connections for gradient flow - LayerNorm for training stability - Dropout for regularization - Optional activation checkpointing

References

James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, and Santiago Ontanon. 2022. FNet: Mixing tokens with Fourier transforms. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 4296-4313, Seattle.

Yongming Rao, Wenliang Zhao, Zheng Zhu, Jiwen Lu, and Jie Zhou. 2021. Global filter networks for image classification. In Advances in Neural Information Processing Systems 34 (NeurIPS 2021), pages 980-993.

Classes¶

FeedForwardNetwork ¶

FeedForwardNetwork(hidden_dim: int, ffn_hidden_dim: int, activation: str = 'gelu', dropout: float = 0.0)

Bases: Module

Standard feedforward network for transformer blocks.

A two-layer MLP with configurable activation function and dropout.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Input and output dimension.	required
`ffn_hidden_dim`	`int`	Hidden dimension of the FFN.	required
`activation`	`str`	Activation function name. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`

Attributes:

Name	Type	Description
`fc1`	`Linear`	First linear layer.
`fc2`	`Linear`	Second linear layer.
`activation`	`Module`	Activation function.
`dropout`	`Dropout`	Dropout layer.

Methods:

Name	Description
`forward`	Forward pass through the FFN.

Source code in spectrans/blocks/base.py

def __init__(
    self,
    hidden_dim: int,
    ffn_hidden_dim: int,
    activation: str = "gelu",
    dropout: float = 0.0,
):
    super().__init__()
    self.hidden_dim = hidden_dim
    self.ffn_hidden_dim = ffn_hidden_dim

    # Linear layers
    self.fc1 = nn.Linear(hidden_dim, ffn_hidden_dim)
    self.fc2 = nn.Linear(ffn_hidden_dim, hidden_dim)

    # Activation function
    activation_functions = {
        "gelu": nn.GELU(),
        "relu": nn.ReLU(),
        "silu": nn.SiLU(),
        "tanh": nn.Tanh(),
        "sigmoid": nn.Sigmoid(),
        "elu": nn.ELU(),
        "leaky_relu": nn.LeakyReLU(),
    }
    if activation not in activation_functions:
        raise ValueError(f"Unknown activation: {activation}")
    self.activation = activation_functions[activation]

    # Dropout
    self.dropout = nn.Dropout(dropout)

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Forward pass through the FFN.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., hidden_dim).	required

Returns:

Type	Description
`Tensor`	Output tensor of shape (..., hidden_dim).

Source code in spectrans/blocks/base.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass through the FFN.

    Parameters
    ----------
    x : torch.Tensor
        Input tensor of shape (..., hidden_dim).

    Returns
    -------
    torch.Tensor
        Output tensor of shape (..., hidden_dim).
    """
    x = self.fc1(x)
    x = self.activation(x)
    x = self.dropout(x)
    x = self.fc2(x)
    return x

ParallelBlock ¶

ParallelBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: SpectralComponent

Transformer block with parallel mixing and FFN branches.

This block processes the mixing layer and FFN in parallel rather than sequentially, which can improve efficiency and has been shown to work well in practice.

Parameters:

Name	Type	Description	Default
`mixing_layer`	`MixingLayer \| Module`	The mixing or attention layer.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Attributes:

Name	Type	Description
`mixing_layer`	`MixingLayer \| Module`	The mixing or attention layer.
`ffn`	`FeedForwardNetwork`	The feedforward network.
`norm`	`LayerNorm`	Layer normalization.
`dropout`	`Dropout`	Dropout layer.

Methods:

Name	Description
`forward`	Forward pass through the parallel block.

Source code in spectrans/blocks/base.py

def __init__(
    self,
    mixing_layer: MixingLayer | nn.Module,
    hidden_dim: int,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    super().__init__()
    self.hidden_dim = hidden_dim
    self.mixing_layer = mixing_layer

    # Default FFN dimension
    if ffn_hidden_dim is None:
        ffn_hidden_dim = 4 * hidden_dim

    # Components
    self.ffn = FeedForwardNetwork(
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
    )
    self.norm = nn.LayerNorm(hidden_dim, eps=norm_eps)
    self.dropout = nn.Dropout(dropout)

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Forward pass through the parallel block.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (batch_size, sequence_length, hidden_dim).	required

Returns:

Type	Description
`Tensor`	Output tensor of shape (batch_size, sequence_length, hidden_dim).

Source code in spectrans/blocks/base.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass through the parallel block.

    Parameters
    ----------
    x : torch.Tensor
        Input tensor of shape (batch_size, sequence_length, hidden_dim).

    Returns
    -------
    torch.Tensor
        Output tensor of shape (batch_size, sequence_length, hidden_dim).
    """
    # Normalize input
    normed = self.norm(x)

    # Process mixing and FFN in parallel
    mixed = self.mixing_layer(normed)
    ffn_out = self.ffn(normed)

    # Combine and add residual
    output: torch.Tensor = x + self.dropout(mixed + ffn_out)

    return output

PostNormBlock ¶

PostNormBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: TransformerBlock

Transformer block with post-layer normalization.

This block applies layer normalization after the mixing layer and FFN, following the original transformer architecture.

Parameters:

Name	Type	Description	Default
`mixing_layer`	`MixingLayer \| Module`	The mixing or attention layer.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/base.py

def __init__(
    self,
    mixing_layer: MixingLayer | nn.Module,
    hidden_dim: int,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    if ffn_hidden_dim is None:
        ffn_hidden_dim = 4 * hidden_dim
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
        use_pre_norm=False,
        norm_eps=norm_eps,
    )

PreNormBlock ¶

PreNormBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: TransformerBlock

Transformer block with pre-layer normalization.

This block applies layer normalization before the mixing layer and FFN, which has been shown to improve training stability.

Parameters:

Name	Type	Description	Default
`mixing_layer`	`MixingLayer \| Module`	The mixing or attention layer.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/base.py

def __init__(
    self,
    mixing_layer: MixingLayer | nn.Module,
    hidden_dim: int,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    if ffn_hidden_dim is None:
        ffn_hidden_dim = 4 * hidden_dim
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
        use_pre_norm=True,
        norm_eps=norm_eps,
    )

TransformerBlock ¶

TransformerBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, use_pre_norm: bool = True, norm_eps: float = 1e-12)

Bases: SpectralComponent

Base class for transformer blocks.

A transformer block combines a mixing/attention layer with a feedforward network, using residual connections and layer normalization.

Parameters:

Name	Type	Description	Default
`mixing_layer`	`MixingLayer \| Module`	The mixing or attention layer for token interaction.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the feedforward network. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function for the FFN. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`use_pre_norm`	`bool`	Whether to use pre-layer normalization. Default is True.	`True`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Attributes:

Name	Type	Description
`mixing_layer`	`MixingLayer \| Module`	The mixing or attention layer.
`ffn`	`FeedForwardNetwork \| None`	The feedforward network.
`norm1`	`LayerNorm`	First layer normalization.
`norm2`	`LayerNorm \| None`	Second layer normalization (if FFN is used).
`dropout`	`Dropout`	Dropout layer.
`use_pre_norm`	`bool`	Whether pre-normalization is used.

Methods:

Name	Description
`forward`	Forward pass through the transformer block.

Source code in spectrans/blocks/base.py

def __init__(
    self,
    mixing_layer: MixingLayer | nn.Module,
    hidden_dim: int,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    use_pre_norm: bool = True,
    norm_eps: float = 1e-12,
):
    super().__init__()
    self.hidden_dim = hidden_dim
    self.mixing_layer = mixing_layer
    self.use_pre_norm = use_pre_norm

    # Layer normalization
    self.norm1 = nn.LayerNorm(hidden_dim, eps=norm_eps)

    # Feedforward network
    if ffn_hidden_dim is not None:
        self.ffn = FeedForwardNetwork(
            hidden_dim=hidden_dim,
            ffn_hidden_dim=ffn_hidden_dim,
            activation=activation,
            dropout=dropout,
        )
        self.norm2 = nn.LayerNorm(hidden_dim, eps=norm_eps)
    else:
        self.ffn = None
        self.norm2 = None

    # Dropout
    self.dropout = nn.Dropout(dropout)

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Forward pass through the transformer block.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (batch_size, sequence_length, hidden_dim).	required

Returns:

Type	Description
`Tensor`	Output tensor of shape (batch_size, sequence_length, hidden_dim).

Source code in spectrans/blocks/base.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass through the transformer block.

    Parameters
    ----------
    x : torch.Tensor
        Input tensor of shape (batch_size, sequence_length, hidden_dim).

    Returns
    -------
    torch.Tensor
        Output tensor of shape (batch_size, sequence_length, hidden_dim).
    """
    output: torch.Tensor
    if self.use_pre_norm:
        # Pre-norm: normalize before mixing
        h = x + self.dropout(self.mixing_layer(self.norm1(x)))
        if self.ffn is not None and self.norm2 is not None:
            output = h + self.dropout(self.ffn(self.norm2(h)))
        else:
            output = h
    else:
        # Post-norm: normalize after mixing
        h = self.norm1(x + self.dropout(self.mixing_layer(x)))
        if self.ffn is not None and self.norm2 is not None:
            output = self.norm2(h + self.dropout(self.ffn(h)))
        else:
            output = h

    return output

AdaptiveBlock ¶

AdaptiveBlock(layers: list[MixingLayer | Module], hidden_dim: int, gate_type: str = 'soft', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: HybridBlock

Transformer block that adaptively selects between mixing strategies.

This block uses a gating mechanism to dynamically choose or blend between different mixing strategies based on the input.

Parameters:

Name	Type	Description	Default
`layers`	`list[MixingLayer \| Module]`	List of mixing layers to choose from.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`gate_type`	`str`	Type of gating ('soft' for weighted sum, 'hard' for selection). Default is 'soft'.	`'soft'`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Attributes:

Name	Type	Description
`layers`	`ModuleList`	List of mixing layers.
`gate`	`Linear`	Gating network for layer selection.
`gate_type`	`str`	Type of gating mechanism.

Methods:

Name	Description
`forward`	Forward pass through the adaptive block.

Source code in spectrans/blocks/hybrid.py

def __init__(
    self,
    layers: list[MixingLayer | nn.Module],
    hidden_dim: int,
    gate_type: str = "soft",
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    super().__init__(
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
        norm_eps=norm_eps,
    )
    self.layers = nn.ModuleList(layers)
    self.num_layers = len(layers)
    self.gate_type = gate_type

    # Gating network
    self.gate = nn.Linear(hidden_dim, self.num_layers)

    # Initialize gate to uniform weights
    nn.init.constant_(self.gate.weight, 0)
    nn.init.constant_(self.gate.bias, 0)

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Forward pass through the adaptive block.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (batch_size, sequence_length, hidden_dim).	required

Returns:

Type	Description
`Tensor`	Output tensor of shape (batch_size, sequence_length, hidden_dim).

Source code in spectrans/blocks/hybrid.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass through the adaptive block.

    Parameters
    ----------
    x : torch.Tensor
        Input tensor of shape (batch_size, sequence_length, hidden_dim).

    Returns
    -------
    torch.Tensor
        Output tensor of shape (batch_size, sequence_length, hidden_dim).
    """
    # Normalize input for mixing
    normed = self.norm1(x)

    # Compute gate values
    gate_input = normed.mean(dim=1)  # (batch_size, hidden_dim)
    gate_logits = self.gate(gate_input)  # (batch_size, num_layers)

    if self.gate_type == "soft":
        # Soft gating: weighted sum of all layers
        gate_weights = F.softmax(gate_logits, dim=-1)  # (batch_size, num_layers)

        # Apply each layer and combine
        mixed = torch.zeros_like(x)
        for i, layer in enumerate(self.layers):
            weight = gate_weights[:, i : i + 1].unsqueeze(1)  # (batch_size, 1, 1)
            mixed = mixed + weight * layer(normed)
    else:  # hard gating
        # Hard gating: select single layer
        gate_idx = torch.argmax(gate_logits, dim=-1)  # (batch_size,)

        # Apply selected layer for each sample
        mixed = torch.zeros_like(x)
        for i in range(x.shape[0]):
            idx = int(gate_idx[i].item())
            mixed[i] = self.layers[idx](normed[i : i + 1])

    # Add residual
    h = x + self.dropout(mixed)

    # Apply FFN with pre-norm
    output: Tensor = h + self.dropout(self.ffn(self.norm2(h)))

    return output

AlternatingBlock ¶

AlternatingBlock(layer1: MixingLayer | Module, layer2: MixingLayer | Module, hidden_dim: int, use_layer1: bool = True, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: HybridBlock

Transformer block that alternates between two mixing strategies.

This block can be used in alternating patterns, e.g., even layers use one type of mixing and odd layers use another.

Parameters:

Name	Type	Description	Default
`layer1`	`MixingLayer \| Module`	First mixing layer.	required
`layer2`	`MixingLayer \| Module`	Second mixing layer.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`use_layer1`	`bool`	Whether to use layer1 (True) or layer2 (False). Default is True.	`True`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Attributes:

Name	Type	Description
`layer1`	`MixingLayer \| Module`	First mixing layer.
`layer2`	`MixingLayer \| Module`	Second mixing layer.
`use_layer1`	`bool`	Which layer to use for this block.

Methods:

Name	Description
`forward`	Forward pass through the alternating block.
`set_layer`	Set which layer to use.

Source code in spectrans/blocks/hybrid.py

def __init__(
    self,
    layer1: MixingLayer | nn.Module,
    layer2: MixingLayer | nn.Module,
    hidden_dim: int,
    use_layer1: bool = True,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    super().__init__(
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
        norm_eps=norm_eps,
    )
    self.layer1 = layer1
    self.layer2 = layer2
    self.use_layer1 = use_layer1

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Forward pass through the alternating block.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (batch_size, sequence_length, hidden_dim).	required

Returns:

Type	Description
`Tensor`	Output tensor of shape (batch_size, sequence_length, hidden_dim).

Source code in spectrans/blocks/hybrid.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass through the alternating block.

    Parameters
    ----------
    x : torch.Tensor
        Input tensor of shape (batch_size, sequence_length, hidden_dim).

    Returns
    -------
    torch.Tensor
        Output tensor of shape (batch_size, sequence_length, hidden_dim).
    """
    # Select which layer to use
    mixing_layer = self.layer1 if self.use_layer1 else self.layer2

    # Apply mixing with pre-norm
    h = x + self.dropout(mixing_layer(self.norm1(x)))

    # Apply FFN with pre-norm
    output: Tensor = h + self.dropout(self.ffn(self.norm2(h)))

    return output

set_layer ¶

set_layer(use_layer1: bool) -> None

Set which layer to use.

Parameters:

Name	Type	Description	Default
`use_layer1`	`bool`	Whether to use layer1 (True) or layer2 (False).	required

Source code in spectrans/blocks/hybrid.py

def set_layer(self, use_layer1: bool) -> None:
    """Set which layer to use.

    Parameters
    ----------
    use_layer1 : bool
        Whether to use layer1 (True) or layer2 (False).
    """
    self.use_layer1 = use_layer1

CascadeBlock ¶

CascadeBlock(layers: list[MixingLayer | Module], hidden_dim: int, share_norm: bool = False, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: HybridBlock

Transformer block that cascades multiple mixing strategies.

This block applies mixing layers sequentially, allowing each layer to refine the representations produced by the previous one.

Parameters:

Name	Type	Description	Default
`layers`	`list[MixingLayer \| Module]`	List of mixing layers to cascade.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`share_norm`	`bool`	Whether to share normalization across layers. Default is False.	`False`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Attributes:

Name	Type	Description
`layers`	`ModuleList`	List of mixing layers to cascade.
`norms`	`ModuleList`	Normalization layers for each mixing layer.
`share_norm`	`bool`	Whether normalization is shared.

Methods:

Name	Description
`forward`	Forward pass through the cascade block.

Source code in spectrans/blocks/hybrid.py

def __init__(
    self,
    layers: list[MixingLayer | nn.Module],
    hidden_dim: int,
    share_norm: bool = False,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    super().__init__(
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
        norm_eps=norm_eps,
    )
    self.layers = nn.ModuleList(layers)
    self.share_norm = share_norm

    # Create normalization layers
    if share_norm:
        # Use the same norm for all layers
        self.norms = nn.ModuleList([self.norm1] * len(layers))
    else:
        # Create separate norms for each layer
        self.norms = nn.ModuleList([nn.LayerNorm(hidden_dim, eps=norm_eps) for _ in layers])

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Forward pass through the cascade block.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (batch_size, sequence_length, hidden_dim).	required

Returns:

Type	Description
`Tensor`	Output tensor of shape (batch_size, sequence_length, hidden_dim).

Source code in spectrans/blocks/hybrid.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass through the cascade block.

    Parameters
    ----------
    x : torch.Tensor
        Input tensor of shape (batch_size, sequence_length, hidden_dim).

    Returns
    -------
    torch.Tensor
        Output tensor of shape (batch_size, sequence_length, hidden_dim).
    """
    # Cascade through mixing layers
    h = x
    for layer, norm in zip(self.layers, self.norms, strict=False):
        h = h + self.dropout(layer(norm(h)))

    # Apply FFN with pre-norm
    output: Tensor = h + self.dropout(self.ffn(self.norm2(h)))

    return output

HybridBlock ¶

HybridBlock(hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: SpectralComponent

Base class for hybrid transformer blocks.

This class provides the foundation for blocks that combine multiple mixing strategies in various ways.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Attributes:

Name	Type	Description
`hidden_dim`	`int`	Hidden dimension of the model.
`ffn`	`FeedForwardNetwork \| None`	The feedforward network.
`dropout`	`Dropout`	Dropout layer.

Source code in spectrans/blocks/hybrid.py

def __init__(
    self,
    hidden_dim: int,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    super().__init__()
    self.hidden_dim = hidden_dim

    # Default FFN dimension
    if ffn_hidden_dim is None:
        ffn_hidden_dim = 4 * hidden_dim

    # Feedforward network
    self.ffn = FeedForwardNetwork(
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
    )

    # Normalization layers (to be used by subclasses)
    self.norm1 = nn.LayerNorm(hidden_dim, eps=norm_eps)
    self.norm2 = nn.LayerNorm(hidden_dim, eps=norm_eps)
    self.norm3 = nn.LayerNorm(hidden_dim, eps=norm_eps)

    # Dropout
    self.dropout = nn.Dropout(dropout)

MultiscaleBlock ¶

MultiscaleBlock(layers: list[MixingLayer | Module], hidden_dim: int, fusion_type: str = 'add', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: HybridBlock

Transformer block that processes multiple scales in parallel.

This block applies different mixing strategies at different scales and combines their outputs, capturing both local and global patterns.

Parameters:

Name	Type	Description	Default
`layers`	`list[MixingLayer \| Module]`	List of mixing layers for different scales.	required
`hidden_dim`	`int`	Hidden dimension of the model.	required
`fusion_type`	`str`	How to fuse outputs ('concat', 'add', 'weighted'). Default is 'add'.	`'add'`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Attributes:

Name	Type	Description
`layers`	`ModuleList`	List of mixing layers for different scales.
`fusion_type`	`str`	Type of fusion mechanism.
`fusion_weights`	`Parameter \| None`	Learnable weights for fusion (if fusion_type is 'weighted').
`fusion_proj`	`Linear \| None`	Projection for concatenation fusion.

Methods:

Name	Description
`forward`	Forward pass through the multiscale block.

Source code in spectrans/blocks/hybrid.py

def __init__(
    self,
    layers: list[MixingLayer | nn.Module],
    hidden_dim: int,
    fusion_type: str = "add",
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    super().__init__(
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=activation,
        dropout=dropout,
        norm_eps=norm_eps,
    )
    self.layers = nn.ModuleList(layers)
    self.num_scales = len(layers)
    self.fusion_type = fusion_type

    # Type annotations for optional attributes
    self.fusion_weights: nn.Parameter | None
    self.fusion_proj: nn.Linear | None

    # Fusion mechanisms
    if fusion_type == "weighted":
        self.fusion_weights = nn.Parameter(torch.ones(self.num_scales) / self.num_scales)
    else:
        self.fusion_weights = None

    if fusion_type == "concat":
        self.fusion_proj = nn.Linear(hidden_dim * self.num_scales, hidden_dim)
    else:
        self.fusion_proj = None

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Forward pass through the multiscale block.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (batch_size, sequence_length, hidden_dim).	required

Returns:

Type	Description
`Tensor`	Output tensor of shape (batch_size, sequence_length, hidden_dim).

Source code in spectrans/blocks/hybrid.py

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass through the multiscale block.

    Parameters
    ----------
    x : torch.Tensor
        Input tensor of shape (batch_size, sequence_length, hidden_dim).

    Returns
    -------
    torch.Tensor
        Output tensor of shape (batch_size, sequence_length, hidden_dim).
    """
    # Normalize input
    normed = self.norm1(x)

    # Apply each scale
    outputs = []
    for layer in self.layers:
        outputs.append(layer(normed))

    # Fuse outputs
    if self.fusion_type == "add":
        mixed = sum(outputs) / self.num_scales
    elif self.fusion_type == "weighted":
        assert self.fusion_weights is not None, (
            "fusion_weights should not be None for weighted fusion"
        )
        weights = F.softmax(self.fusion_weights, dim=0)
        mixed = sum(w * out for w, out in zip(weights, outputs, strict=False))
    elif self.fusion_type == "concat":
        mixed = torch.cat(outputs, dim=-1)
        assert self.fusion_proj is not None, "fusion_proj should not be None for concat fusion"
        mixed = self.fusion_proj(mixed)
    else:
        raise ValueError(f"Unknown fusion type: {self.fusion_type}")

    # Add residual
    h = x + self.dropout(mixed)

    # Apply FFN with pre-norm
    output: Tensor = h + self.dropout(self.ffn(self.norm2(h)))

    return output

AFNOBlock ¶

AFNOBlock(hidden_dim: int, sequence_length: int, modes: int | None = None, mlp_hidden_dim: int | None = None, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

AFNO transformer block with adaptive Fourier neural operator.

This block uses adaptive Fourier mode selection with MLPs in the frequency domain for token mixing.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`sequence_length`	`int`	Maximum sequence length.	required
`modes`	`int \| None`	Number of Fourier modes to retain. Default is sequence_length // 2.	`None`
`mlp_hidden_dim`	`int \| None`	Hidden dimension of the frequency-domain MLP. Default is hidden_dim.	`None`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    sequence_length: int,
    modes: int | None = None,
    mlp_hidden_dim: int | None = None,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    # Determine mlp_ratio from mlp_hidden_dim if provided
    mlp_ratio = mlp_hidden_dim / hidden_dim if mlp_hidden_dim is not None else 2.0

    mixing_layer = AFNOMixing(
        hidden_dim=hidden_dim,
        max_sequence_length=sequence_length,
        modes_seq=modes,
        modes_hidden=modes,
        mlp_ratio=mlp_ratio,
        activation=cast(ActivationType, activation),
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

FNetBlock ¶

FNetBlock(hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

FNet transformer block with Fourier mixing.

This block uses Fourier transforms for token mixing, providing an alternative to attention with \(O(n \log n)\) complexity.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = FourierMixing(hidden_dim=hidden_dim)
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

FNO2DBlock ¶

FNO2DBlock(hidden_dim: int, modes_h: int | None = None, modes_w: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

2D FNO transformer block for image or grid data.

This block uses 2D Fourier neural operators for spatial data processing.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension (number of channels).	required
`modes_h`	`int \| None`	Number of Fourier modes for height. Default is 16.	`None`
`modes_w`	`int \| None`	Number of Fourier modes for width. Default is 16.	`None`
`num_layers`	`int`	Number of FNO layers. Default is 1.	`1`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    modes_h: int | None = None,
    modes_w: int | None = None,
    num_layers: int = 1,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    if modes_h is None:
        modes_h = 16
    if modes_w is None:
        modes_w = 16

    # For 2D, we use FourierNeuralOperator with 2D mode specification
    modes_2d = (modes_h, modes_w)
    if num_layers == 1:
        mixing_layer: nn.Module = FourierNeuralOperator(
            hidden_dim=hidden_dim,
            modes=modes_2d,  # Use 2D mode tuple
            activation=cast(ActivationType, activation),
        )
    else:
        # Stack multiple FNO layers
        layers = []
        for _ in range(num_layers):
            layers.append(
                FourierNeuralOperator(
                    hidden_dim=hidden_dim,
                    modes=modes_2d,  # Use 2D mode tuple
                    activation=cast(ActivationType, activation),
                )
            )
        mixing_layer = nn.Sequential(*layers)

    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

FNOBlock ¶

FNOBlock(hidden_dim: int, modes: int | None = None, num_layers: int = 1, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

FNO transformer block with Fourier neural operator.

This block uses Fourier neural operators for learning mappings between function spaces.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`modes`	`int \| None`	Number of Fourier modes. Default is 16.	`None`
`num_layers`	`int`	Number of FNO layers. Default is 1.	`1`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    modes: int | None = None,
    num_layers: int = 1,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    if modes is None:
        modes = 16

    # Use FourierNeuralOperator for mixing
    if num_layers == 1:
        mixing_layer: nn.Module = FourierNeuralOperator(
            hidden_dim=hidden_dim,
            modes=modes,
            activation=cast(ActivationType, activation),
        )
    else:
        # Stack multiple FNO layers
        layers = []
        for _ in range(num_layers):
            layers.append(
                FourierNeuralOperator(
                    hidden_dim=hidden_dim,
                    modes=modes,
                    activation=cast(ActivationType, activation),
                )
            )
        mixing_layer = nn.Sequential(*layers)

    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

GFNetBlock ¶

GFNetBlock(hidden_dim: int, sequence_length: int, ffn_hidden_dim: int | None = None, filter_activation: str = 'sigmoid', filter_init_std: float = 0.02, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

GFNet transformer block with global filter mixing.

This block uses learnable frequency-domain filters for token mixing.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`sequence_length`	`int`	Maximum sequence length.	required
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`filter_activation`	`str`	Activation for filters ('sigmoid', 'tanh', or 'identity'). Default is 'sigmoid'.	`'sigmoid'`
`filter_init_std`	`float`	Initialization std for filters. Default is 0.02.	`0.02`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    sequence_length: int,
    ffn_hidden_dim: int | None = None,
    filter_activation: str = "sigmoid",
    filter_init_std: float = 0.02,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = GlobalFilterMixing(
        hidden_dim=hidden_dim,
        sequence_length=sequence_length,
        activation=cast(ActivationType, filter_activation),
        filter_init_std=filter_init_std,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

LSTBlock ¶

LSTBlock(hidden_dim: int, num_heads: int = 8, transform_type: str = 'dct', use_scaling: bool = True, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

LST transformer block with linear spectral transform attention.

This block uses orthogonal transforms (DCT, DST, or Hadamard) for attention computation.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`num_heads`	`int`	Number of attention heads. Default is 8.	`8`
`transform_type`	`str`	Type of transform ('dct', 'dst', or 'hadamard'). Default is 'dct'.	`'dct'`
`use_scaling`	`bool`	Whether to use learnable scaling. Default is True.	`True`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    num_heads: int = 8,
    transform_type: str = "dct",
    use_scaling: bool = True,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = LSTAttention(
        hidden_dim=hidden_dim,
        num_heads=num_heads,
        transform_type=cast(TransformLSTType, transform_type),
        learnable_scale=use_scaling,
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

SpectralAttentionBlock ¶

SpectralAttentionBlock(hidden_dim: int, num_heads: int = 8, num_features: int | None = None, kernel_type: str = 'gaussian', ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

Spectral attention transformer block.

This block uses spectral attention with random Fourier features for kernel approximation.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`num_heads`	`int`	Number of attention heads. Default is 8.	`8`
`num_features`	`int \| None`	Number of random features. Default is 256.	`None`
`kernel_type`	`str`	Type of kernel ('gaussian' or 'laplacian'). Default is 'gaussian'.	`'gaussian'`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    num_heads: int = 8,
    num_features: int | None = None,
    kernel_type: str = "gaussian",
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = SpectralAttention(
        hidden_dim=hidden_dim,
        num_heads=num_heads,
        num_features=num_features,
        kernel_type=cast(KernelType, kernel_type),
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )

WaveletBlock ¶

WaveletBlock(hidden_dim: int, wavelet: str = 'db4', levels: int = 3, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)

Bases: PreNormBlock

Wavelet transformer block with wavelet mixing.

This block uses discrete wavelet transforms for multiscale token mixing.

Parameters:

Name	Type	Description	Default
`hidden_dim`	`int`	Hidden dimension of the model.	required
`wavelet`	`str`	Type of wavelet. Default is 'db4'.	`'db4'`
`levels`	`int`	Number of decomposition levels. Default is 3.	`3`
`ffn_hidden_dim`	`int \| None`	Hidden dimension of the FFN. Default is 4 * hidden_dim.	`None`
`activation`	`str`	Activation function. Default is 'gelu'.	`'gelu'`
`dropout`	`float`	Dropout probability. Default is 0.0.	`0.0`
`norm_eps`	`float`	Epsilon for layer normalization. Default is 1e-12.	`1e-12`

Source code in spectrans/blocks/spectral.py

def __init__(
    self,
    hidden_dim: int,
    wavelet: str = "db4",
    levels: int = 3,
    ffn_hidden_dim: int | None = None,
    activation: str = "gelu",
    dropout: float = 0.0,
    norm_eps: float = 1e-12,
):
    mixing_layer = WaveletMixing(
        hidden_dim=hidden_dim,
        wavelet=cast(WaveletType, wavelet),
        levels=levels,
        dropout=dropout,
    )
    super().__init__(
        mixing_layer=mixing_layer,
        hidden_dim=hidden_dim,
        ffn_hidden_dim=ffn_hidden_dim,
        activation=cast(ActivationType, activation),
        dropout=dropout,
        norm_eps=norm_eps,
    )