Core Base Classes¶
spectrans.core.base ¶
Base classes and interfaces for spectral transformer components.
This module defines the core abstract base classes and interfaces that all spectral transformer components must implement. These classes establish consistent APIs for forward propagation and component composition throughout the spectrans library.
The inheritance hierarchy provides both mathematical rigor and software engineering best practices, ensuring that all spectral transforms maintain proper interfaces while allowing for flexible implementation strategies.
Classes:
| Name | Description |
|---|---|
SpectralComponent |
Abstract base class requiring forward() implementation. |
SpectralTransform |
Interface for spectral transforms with transform/inverse_transform methods. |
MixingLayer |
Base class for token mixing layers with dropout support. |
AttentionLayer |
Base class for attention mechanisms with multi-head support. |
TransformerBlock |
Complete transformer block with mixing, FFN, and residual connections. |
BaseModel |
Full model class with embedding, positional encoding, and classification. |
Examples:
Implementing a custom spectral component:
>>> import torch.nn as nn
>>> from spectrans.core.base import SpectralComponent
>>> class CustomComponent(SpectralComponent):
... def forward(self, x):
... return x * 2 # Simple scaling
Building a transformer block:
>>> import torch.nn as nn
>>> from spectrans.core.base import TransformerBlock
>>> from spectrans.layers.mixing.fourier import FourierMixing
>>> mixing_layer = FourierMixing(hidden_dim=768)
>>> ffn = nn.Sequential(nn.Linear(768, 3072), nn.GELU(), nn.Linear(3072, 768))
>>> block = TransformerBlock(mixing_layer, ffn)
Notes
The base classes implement several key design patterns:
- Template Method Pattern: TransformerBlock defines the structure while allowing flexible mixing layer implementations
- Strategy Pattern: Different spectral transforms can be swapped via the same interface
- Composition over Inheritance: Complex behaviors built by composing simple components
Mathematical Properties: - All spectral components preserve tensor shapes in the sequence dimension - Residual connections maintain gradient flow and training stability - Dropout is applied consistently after each sub-layer - Layer normalization follows the pre-norm architecture pattern
The TransformerBlock implements the standard architecture:
See Also
spectrans.core.types : Type definitions used in base classes spectrans.transforms.base : Transform-specific base classes
Classes¶
SpectralComponent ¶
Bases: Module, ABC
Base class for all spectral components.
This abstract base class defines the interface that all spectral transformer components must implement.
Methods:
| Name | Description |
|---|---|
forward |
Forward pass. |
Functions¶
forward
abstractmethod
¶
Forward pass.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
*args
|
Any
|
Additional positional arguments for subclass-specific parameters. |
()
|
**kwargs
|
Any
|
Additional keyword arguments for subclass-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
Tensor | tuple[Tensor, ...]
|
Output tensor(s). Single tensor for most cases, tuple for attention layers that optionally return attention weights. |
Source code in spectrans/core/base.py
AttentionLayer ¶
Bases: SpectralComponent
Base class for attention layers.
Attention layers implement various forms of spectral attention mechanisms as alternatives to standard multi-head attention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
num_heads
|
int
|
Number of attention heads. |
1
|
dropout
|
float
|
Dropout probability. |
0.0
|
Attributes:
| Name | Type | Description |
|---|---|---|
hidden_dim |
int
|
Hidden dimension of the model. |
num_heads |
int
|
Number of attention heads. |
dropout |
Module
|
Dropout layer or identity if dropout is 0. |
Source code in spectrans/core/base.py
TransformerBlock ¶
TransformerBlock(mixing_layer: MixingLayer | AttentionLayer, ffn: Module | None = None, norm_layer: type[Module] = LayerNorm, dropout: float = 0.0)
Bases: SpectralComponent
Base class for transformer blocks.
Transformer blocks combine mixing/attention layers with feedforward networks and normalization to form complete transformer layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mixing_layer
|
MixingLayer | AttentionLayer
|
The mixing or attention layer for token interactions. |
required |
ffn
|
Module | None
|
Feedforward network module. If None, no FFN is used. |
None
|
norm_layer
|
type[Module]
|
Normalization layer class to use. |
nn.LayerNorm
|
dropout
|
float
|
Dropout probability for residual connections. |
0.0
|
Attributes:
| Name | Type | Description |
|---|---|---|
mixing_layer |
MixingLayer | AttentionLayer
|
The mixing or attention layer. |
ffn |
Module | None
|
Feedforward network module. |
hidden_dim |
int
|
Hidden dimension extracted from mixing layer. |
norm1 |
Module
|
First normalization layer. |
norm2 |
Module | None
|
Second normalization layer (if FFN is used). |
dropout |
Module
|
Dropout layer for residual connections. |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through transformer block. |
Source code in spectrans/core/base.py
Functions¶
forward ¶
Forward pass through transformer block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor of shape (batch_size, sequence_length, hidden_dim). |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor of same shape as input. |
Source code in spectrans/core/base.py
BaseModel ¶
BaseModel(num_layers: int, hidden_dim: int, max_seq_length: int = 512, vocab_size: int | None = None, num_classes: int | None = None, dropout: float = 0.0)
Bases: Module
Base class for spectral transformer models.
This class provides common functionality for all spectral transformer model variants.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
num_layers
|
int
|
Number of transformer layers. |
required |
hidden_dim
|
int
|
Hidden dimension of the model. |
required |
max_seq_length
|
int
|
Maximum sequence length. |
512
|
vocab_size
|
int | None
|
Vocabulary size for embedding layer. If None, no embedding is used. |
None
|
num_classes
|
int | None
|
Number of output classes. If None, no classification head is used. |
None
|
dropout
|
float
|
Dropout probability. |
0.0
|
Attributes:
| Name | Type | Description |
|---|---|---|
num_layers |
int
|
Number of transformer layers. |
hidden_dim |
int
|
Hidden dimension of the model. |
max_seq_length |
int
|
Maximum sequence length. |
vocab_size |
int | None
|
Vocabulary size. |
num_classes |
int | None
|
Number of output classes. |
embedding |
Embedding | None
|
Optional embedding layer. |
pos_embedding |
Parameter
|
Positional embedding parameters. |
dropout |
Module
|
Dropout layer. |
classifier |
Linear | None
|
Optional classification head. |
blocks |
ModuleList
|
List of transformer blocks (populated by subclasses). |
Methods:
| Name | Description |
|---|---|
forward |
Forward pass through the model. |
Source code in spectrans/core/base.py
Functions¶
forward ¶
Forward pass through the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input tensor. Shape depends on whether embedding is used: - With embedding: (batch_size, sequence_length) containing token indices - Without embedding: (batch_size, sequence_length, hidden_dim) |
required |
mask
|
Tensor | None
|
Optional attention mask of shape (batch_size, sequence_length). |
None
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Output tensor. Shape depends on whether classifier is used: - With classifier: (batch_size, num_classes) - Without classifier: (batch_size, sequence_length, hidden_dim) |