Base Model Classes¶
spectrans.models.base ¶
Base model classes for spectral transformers.
This module provides the base model classes and common functionality for building complete spectral transformer models. The base classes handle common tasks like embeddings, positional encoding, and output projections, while allowing specific models to customize the core transformer blocks and mixing layers.
Classes:
| Name | Description |
|---|---|
BaseModel |
Abstract base class for all spectral transformer models. |
PositionalEncoding |
Sinusoidal positional encoding following the original Transformer paper. |
LearnedPositionalEncoding |
Learnable positional embeddings as an alternative to sinusoidal encoding. |
ClassificationHead |
Output head for classification tasks. |
RegressionHead |
Output head for regression tasks. |
SequenceHead |
Output head for sequence-to-sequence tasks. |
Examples:
Creating a custom model by extending BaseModel:
>>> from spectrans.models.base import BaseModel
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>> from spectrans.blocks.base import PreNormBlock
>>> class MyModel(BaseModel):
... def build_blocks(self):
... return nn.ModuleList([
... PreNormBlock(
... mixing_layer=FourierMixing(self.hidden_dim),
... hidden_dim=self.hidden_dim,
... ffn_hidden_dim=self.ffn_hidden_dim,
... dropout=self.dropout
... )
... for _ in range(self.num_layers)
... ])
Using positional encoding:
>>> from spectrans.models.base import PositionalEncoding
>>> pos_encoder = PositionalEncoding(hidden_dim=768, max_sequence_length=1024)
>>> embeddings = torch.randn(32, 512, 768)
>>> encoded = pos_encoder(embeddings)
Notes
The base model architecture follows the standard transformer pattern:
- Input embedding (optional)
- Positional encoding (optional)
- Stack of transformer blocks
- Output projection/head (task-specific)
All models support gradient checkpointing for memory-efficient training and can be easily configured through Pydantic configuration objects.
References
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017), pages 5998-6008.
Classes¶
BaseModel ¶
BaseModel(vocab_size: int | None = None, hidden_dim: int = 768, num_layers: int = 12, max_sequence_length: int = 512, num_classes: int | None = None, use_positional_encoding: bool = True, positional_encoding_type: PositionalEncodingType = 'sinusoidal', dropout: float = 0.1, ffn_hidden_dim: int | None = None, norm_eps: float = 1e-12, output_type: OutputHeadType = 'classification', gradient_checkpointing: bool = False)
Bases: SpectralComponent, ABC
Abstract base class for spectral transformer models.
This class provides the common functionality shared by all spectral transformer models, including embeddings, positional encoding, and output heads. Subclasses must implement the build_blocks method to define their specific architecture.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vocab_size
|
int | None
|
Size of the vocabulary for token embeddings. If None, no input embedding layer is created (assumes pre-embedded inputs). |
None
|
hidden_dim
|
int
|
Hidden dimension size for the model. |
768
|
num_layers
|
int
|
Number of transformer blocks in the model. |
12
|
max_sequence_length
|
int
|
Maximum sequence length the model can process. |
512
|
num_classes
|
int | None
|
Number of output classes for classification. If None, no classification head is added. |
None
|
use_positional_encoding
|
bool
|
Whether to use positional encoding. Default is True. |
True
|
positional_encoding_type
|
PositionalEncodingType
|
Type of positional encoding: 'sinusoidal', 'learned', 'rotary', 'alibi', or 'none'. Default is 'sinusoidal'. |
'sinusoidal'
|
dropout
|
float
|
Dropout probability. Default is 0.1. |
0.1
|
ffn_hidden_dim
|
int | None
|
Hidden dimension for feedforward networks. If None, defaults to 4 * hidden_dim. |
None
|
norm_eps
|
float
|
Epsilon for layer normalization. Default is 1e-12. |
1e-12
|
output_type
|
OutputHeadType
|
Type of output head: 'classification', 'regression', 'sequence', 'lm', or 'none'. Default is 'classification'. |
'classification'
|
gradient_checkpointing
|
bool
|
Whether to use gradient checkpointing for memory efficiency. Default is False. |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
hidden_dim |
int
|
Hidden dimension size. |
num_layers |
int
|
Number of transformer blocks. |
max_sequence_length |
int
|
Maximum sequence length. |
embedding |
Embedding | None
|
Token embedding layer (if vocab_size is provided). |
positional_encoding |
PositionalEncoding | LearnedPositionalEncoding | None
|
Positional encoding module. |
blocks |
ModuleList
|
List of transformer blocks. |
output_head |
Module | None
|
Task-specific output head. |
dropout |
Dropout
|
Dropout layer. |
Methods:
| Name | Description |
|---|---|
build_blocks |
Build the transformer blocks for the model. |
forward |
Forward pass through the model. |
from_config |
Create model instance from configuration object. |
Source code in spectrans/models/base.py
148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 | |
Functions¶
build_blocks
abstractmethod
¶
Build the transformer blocks for the model.
This method must be implemented by subclasses to define the specific architecture using appropriate mixing layers.
Returns:
| Type | Description |
|---|---|
ModuleList
|
List of transformer blocks. |
Source code in spectrans/models/base.py
forward ¶
forward(input_ids: Tensor | None = None, inputs_embeds: Tensor | None = None, attention_mask: Tensor | None = None) -> Tensor
Forward pass through the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_ids
|
Tensor | None
|
Input token IDs of shape (batch_size, sequence_length). Required if embedding layer exists. |
None
|
inputs_embeds
|
Tensor | None
|
Pre-embedded inputs of shape (batch_size, sequence_length, hidden_dim). Used if no embedding layer or to bypass embedding. |
None
|
attention_mask
|
Tensor | None
|
Attention mask of shape (batch_size, sequence_length). Values should be 0 or 1 (1 for tokens to attend to). |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor. Shape depends on the output head: - Classification: (batch_size, num_classes) - Regression: (batch_size, 1) - Sequence: (batch_size, sequence_length, vocab_size) - None: (batch_size, sequence_length, hidden_dim) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If neither input_ids nor inputs_embeds is provided. |
Source code in spectrans/models/base.py
from_config
classmethod
¶
from_config(config: ModelConfig) -> BaseModel
Create model instance from configuration object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
ModelConfig
|
Configuration object with model parameters. |
required |
Returns:
| Type | Description |
|---|---|
BaseModel
|
Configured model instance. |
Source code in spectrans/models/base.py
PositionalEncoding ¶
Bases: Module
Sinusoidal positional encoding.
This module adds sinusoidal positional encodings to embeddings, following the approach in "Attention is All You Need".
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Dimension of the embeddings. |
required |
max_sequence_length
|
int
|
Maximum sequence length to encode. |
5000
|
dropout
|
float
|
Dropout probability. Default is 0.1. |
0.1
|
Attributes:
| Name | Type | Description |
|---|---|---|
dropout |
Dropout
|
Dropout layer. |
pe |
Tensor
|
Precomputed positional encodings. |
Methods:
| Name | Description |
|---|---|
forward |
Add positional encoding to input tensor. |
Source code in spectrans/models/base.py
Functions¶
forward ¶
Add positional encoding to input tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Tensor with positional encoding added. |
Source code in spectrans/models/base.py
LearnedPositionalEncoding ¶
Bases: Module
Learned positional embeddings.
This module uses learnable positional embeddings instead of fixed sinusoidal encodings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Dimension of the embeddings. |
required |
max_sequence_length
|
int
|
Maximum sequence length to encode. |
5000
|
dropout
|
float
|
Dropout probability. Default is 0.1. |
0.1
|
Attributes:
| Name | Type | Description |
|---|---|---|
position_embeddings |
Embedding
|
Learnable position embeddings. |
dropout |
Dropout
|
Dropout layer. |
Methods:
| Name | Description |
|---|---|
forward |
Add learned positional embeddings to input tensor. |
Source code in spectrans/models/base.py
Functions¶
forward ¶
Add learned positional embeddings to input tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Tensor with positional embeddings added. |
Source code in spectrans/models/base.py
RotaryPositionalEncoding ¶
Bases: Module
Rotary Position Embedding (RoPE).
This module implements Rotary Position Embeddings as described in the RoFormer paper. RoPE encodes absolute position with rotation matrix and naturally incorporates relative position dependency in self-attention formulation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Dimension of the embeddings. Must be even. |
required |
max_sequence_length
|
int
|
Maximum sequence length to encode. |
5000
|
base
|
float
|
Base for the frequency calculation. Default is 10000. |
10000.0
|
Attributes:
| Name | Type | Description |
|---|---|---|
inv_freq |
Tensor
|
Inverse frequencies for computing rotary embeddings. |
cos_cached |
Tensor | None
|
Cached cosine values for positions. |
sin_cached |
Tensor | None
|
Cached sine values for positions. |
References
Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. 2024. RoFormer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063.
Methods:
| Name | Description |
|---|---|
forward |
Apply rotary position embedding to input tensor. |
Source code in spectrans/models/base.py
Functions¶
forward ¶
Apply rotary position embedding to input tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, num_heads, sequence_length, head_dim) or (batch_size, sequence_length, hidden_dim). |
required |
offset
|
int
|
Position offset for incremental decoding. Default is 0. |
0
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Tensor with rotary position embeddings applied. |
Source code in spectrans/models/base.py
ALiBiPositionalBias ¶
Bases: Module
Attention with Linear Biases (ALiBi) positional encoding.
This module implements ALiBi, which adds a linear bias to attention scores based on the relative distance between tokens. Unlike traditional position embeddings, ALiBi enables extrapolation to longer sequences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_heads
|
int
|
Number of attention heads. |
required |
max_sequence_length
|
int
|
Maximum sequence length to encode. |
5000
|
Attributes:
| Name | Type | Description |
|---|---|---|
num_heads |
int
|
Number of attention heads. |
slopes |
Tensor
|
Head-specific slope parameters. |
alibi |
Tensor | None
|
Cached linear bias matrix. |
References
Ofir Press, Noah A. Smith, and Mike Lewis. 2022. Train short, test long: Attention with linear biases enables input length extrapolation. In Proceedings of the International Conference on Learning Representations (ICLR).
Methods:
| Name | Description |
|---|---|
forward |
Add ALiBi bias to attention scores. |
get_bias |
Get ALiBi bias matrix for a given sequence length. |
Source code in spectrans/models/base.py
Functions¶
forward ¶
Add ALiBi bias to attention scores.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attention_scores
|
Tensor
|
Attention scores of shape (batch_size, num_heads, seq_len, seq_len). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Attention scores with ALiBi bias added. |
Source code in spectrans/models/base.py
get_bias ¶
Get ALiBi bias matrix for a given sequence length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
seq_len
|
int
|
Sequence length. |
required |
device
|
device | None
|
Device to place the bias tensor. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
ALiBi bias of shape (1, num_heads, seq_len, seq_len). |
Source code in spectrans/models/base.py
ClassificationHead ¶
ClassificationHead(hidden_dim: int, num_classes: int, dropout: float = 0.1, pooling: PoolingType = 'cls')
Bases: Module
Classification output head.
This module pools sequence outputs and projects to class logits.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Input hidden dimension. |
required |
num_classes
|
int
|
Number of output classes. |
required |
dropout
|
float
|
Dropout probability. Default is 0.1. |
0.1
|
pooling
|
PoolingType
|
Pooling strategy: 'cls', 'mean', or 'max'. Default is 'cls'. |
'cls'
|
Attributes:
| Name | Type | Description |
|---|---|---|
pooling |
PoolingType
|
Pooling strategy. |
dropout |
Dropout
|
Dropout layer. |
classifier |
Linear
|
Output projection layer. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through classification head. |
Source code in spectrans/models/base.py
Functions¶
forward ¶
Forward pass through classification head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_states
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
attention_mask
|
Tensor | None
|
Attention mask for pooling operations. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Classification logits of shape (batch_size, num_classes). |
Source code in spectrans/models/base.py
RegressionHead ¶
Bases: Module
Regression output head.
This module pools sequence outputs and projects to a scalar value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Input hidden dimension. |
required |
dropout
|
float
|
Dropout probability. Default is 0.1. |
0.1
|
pooling
|
PoolingType
|
Pooling strategy: 'cls', 'mean', or 'max'. Default is 'mean'. |
'mean'
|
Attributes:
| Name | Type | Description |
|---|---|---|
pooling |
PoolingType
|
Pooling strategy. |
dropout |
Dropout
|
Dropout layer. |
regressor |
Linear
|
Output projection layer. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through regression head. |
Source code in spectrans/models/base.py
Functions¶
forward ¶
Forward pass through regression head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_states
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
attention_mask
|
Tensor | None
|
Attention mask for pooling operations. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Regression output of shape (batch_size, 1). |
Source code in spectrans/models/base.py
SequenceHead ¶
Bases: Module
Sequence-to-sequence output head.
This module projects hidden states to vocabulary logits for sequence generation tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Input hidden dimension. |
required |
vocab_size
|
int
|
Output vocabulary size. |
required |
dropout
|
float
|
Dropout probability. Default is 0.1. |
0.1
|
Attributes:
| Name | Type | Description |
|---|---|---|
dropout |
Dropout
|
Dropout layer. |
lm_head |
Linear
|
Language modeling head. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through sequence head. |
Source code in spectrans/models/base.py
Functions¶
forward ¶
Forward pass through sequence head.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_states
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
attention_mask
|
Tensor | None
|
Not used, kept for interface consistency. |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Vocabulary logits of shape (batch_size, sequence_length, vocab_size). |