Base Block Classes¶
spectrans.blocks.base ¶
Base classes and interfaces for transformer blocks.
This module provides the base classes and interfaces for building transformer blocks in the spectrans library. Transformer blocks are composed of mixing/attention layers followed by feedforward networks, with residual connections and normalization.
Classes:
| Name | Description |
|---|---|
TransformerBlock |
Base class for all transformer blocks. |
FeedForwardNetwork |
Standard feedforward network with configurable activation. |
PreNormBlock |
Transformer block with pre-layer normalization. |
PostNormBlock |
Transformer block with post-layer normalization. |
ParallelBlock |
Transformer block with parallel mixing and FFN branches. |
Examples:
Creating a custom transformer block:
>>> from spectrans.blocks.base import TransformerBlock
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>> block = TransformerBlock(
... mixing_layer=FourierMixing(hidden_dim=768),
... hidden_dim=768,
... use_pre_norm=True
... )
Notes
The transformer block architecture follows the standard pattern: - Mixing/Attention layer with residual connection - Feedforward network with residual connection - Layer normalization (pre-norm or post-norm) - Optional dropout for regularization
References
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pages 5998-6008.
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. 2020. On layer normalization in the transformer architecture. In Proceedings of the 37th International Conference on Machine Learning (ICML 2020), pages 10524-10533.
Classes¶
TransformerBlock ¶
TransformerBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, use_pre_norm: bool = True, norm_eps: float = 1e-12)
Bases: SpectralComponent
Base class for transformer blocks.
A transformer block combines a mixing/attention layer with a feedforward network, using residual connections and layer normalization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer for token interaction. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the feedforward network. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function for the FFN. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
use_pre_norm
|
bool
|
Whether to use pre-layer normalization. Default is True. |
True
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
mixing_layer |
MixingLayer | Module
|
The mixing or attention layer. |
ffn |
FeedForwardNetwork | None
|
The feedforward network. |
norm1 |
LayerNorm
|
First layer normalization. |
norm2 |
LayerNorm | None
|
Second layer normalization (if FFN is used). |
dropout |
Dropout
|
Dropout layer. |
use_pre_norm |
bool
|
Whether pre-normalization is used. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the transformer block. |
Source code in spectrans/blocks/base.py
Functions¶
forward ¶
Forward pass through the transformer block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |
Source code in spectrans/blocks/base.py
FeedForwardNetwork ¶
FeedForwardNetwork(hidden_dim: int, ffn_hidden_dim: int, activation: str = 'gelu', dropout: float = 0.0)
Bases: Module
Standard feedforward network for transformer blocks.
A two-layer MLP with configurable activation function and dropout.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Input and output dimension. |
required |
ffn_hidden_dim
|
int
|
Hidden dimension of the FFN. |
required |
activation
|
str
|
Activation function name. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
Attributes:
| Name | Type | Description |
|---|---|---|
fc1 |
Linear
|
First linear layer. |
fc2 |
Linear
|
Second linear layer. |
activation |
Module
|
Activation function. |
dropout |
Dropout
|
Dropout layer. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the FFN. |
Source code in spectrans/blocks/base.py
Functions¶
forward ¶
Forward pass through the FFN.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (..., hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (..., hidden_dim). |
Source code in spectrans/blocks/base.py
PreNormBlock ¶
PreNormBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: TransformerBlock
Transformer block with pre-layer normalization.
This block applies layer normalization before the mixing layer and FFN, which has been shown to improve training stability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/base.py
PostNormBlock ¶
PostNormBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: TransformerBlock
Transformer block with post-layer normalization.
This block applies layer normalization after the mixing layer and FFN, following the original transformer architecture.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Source code in spectrans/blocks/base.py
ParallelBlock ¶
ParallelBlock(mixing_layer: MixingLayer | Module, hidden_dim: int, ffn_hidden_dim: int | None = None, activation: str = 'gelu', dropout: float = 0.0, norm_eps: float = 1e-12)
Bases: SpectralComponent
Transformer block with parallel mixing and FFN branches.
This block processes the mixing layer and FFN in parallel rather than sequentially, which can improve efficiency and has been shown to work well in practice.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | Module
|
The mixing or attention layer. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
ffn_hidden_dim
|
int | None
|
Hidden dimension of the FFN. Default is 4 * hidden_dim. |
None
|
activation
|
str
|
Activation function. Default is 'gelu'. |
'gelu'
|
dropout
|
float
|
Dropout probability. Default is 0.0. |
0.0
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
Attributes:
| Name | Type | Description |
|---|---|---|
mixing_layer |
MixingLayer | Module
|
The mixing or attention layer. |
ffn |
FeedForwardNetwork
|
The feedforward network. |
norm |
LayerNorm
|
Layer normalization. |
dropout |
Dropout
|
Dropout layer. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the parallel block. |
Source code in spectrans/blocks/base.py
Functions¶
forward ¶
Forward pass through the parallel block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of shape (batch_size, sequence_length, hidden_dim). |