Attention Configuration¶
spectrans.config.layers.attention ¶
Configuration models for attention layer components.
This module provides Pydantic models for validating and typing configuration data used to construct attention layers in spectrans.
Classes:
| Name | Description |
|---|---|
SpectralAttentionConfig |
Configuration for Spectral Attention layer with Random Fourier Features. |
LSTAttentionConfig |
Configuration for Linear Spectral Transform Attention layer. |
DCTAttentionConfig |
Configuration for DCT-based attention layer. |
HadamardAttentionConfig |
Configuration for Hadamard-based attention layer. |
MixedTransformAttentionConfig |
Configuration for mixed transform attention layer. |
Notes
All configuration models use Pydantic v2 BaseModel for validation and type safety. Attention layer configurations inherit from AttentionLayerConfig in the parent core module.
Examples:
>>> from spectrans.config.layers.attention import SpectralAttentionConfig
>>> config = SpectralAttentionConfig(
... hidden_dim=768,
... num_heads=8,
... num_features=256
... )
>>> print(config.kernel_type)
'softmax'
Classes¶
SpectralAttentionConfig ¶
Bases: AttentionLayerConfig
Configuration for Spectral Attention with Random Fourier Features.
Attributes:
| Name | Type | Description |
|---|---|---|
num_features |
int | None
|
Number of random Fourier features, defaults to None (uses head_dim). |
kernel_type |
KernelType
|
Type of kernel ('gaussian' or 'softmax'), defaults to 'softmax'. |
use_orthogonal |
bool
|
Whether to use orthogonal random features, defaults to True. |
feature_redraw |
bool
|
Whether to redraw features during training, defaults to False. |
use_bias |
bool
|
Whether to use bias in projections, defaults to True. |
LSTAttentionConfig ¶
Bases: AttentionLayerConfig
Configuration for Linear Spectral Transform Attention.
Attributes:
| Name | Type | Description |
|---|---|---|
transform_type |
TransformLSTType
|
Type of spectral transform ('dct', 'dst', 'hadamard', 'mixed'), defaults to 'dct'. |
learnable_scale |
bool
|
Whether to use learnable diagonal scaling, defaults to True. |
normalize |
bool
|
Whether to normalize transform output, defaults to True. |
use_bias |
bool
|
Whether to use bias in projections, defaults to True. |
DCTAttentionConfig ¶
Bases: AttentionLayerConfig
Configuration for DCT-based attention layer.
Attributes:
| Name | Type | Description |
|---|---|---|
dct_type |
int
|
Type of DCT transform (typically 2), defaults to 2. |
learnable_scale |
bool
|
Whether to use learnable diagonal scaling, defaults to True. |
HadamardAttentionConfig ¶
Bases: AttentionLayerConfig
Configuration for Hadamard-based attention layer.
Attributes:
| Name | Type | Description |
|---|---|---|
scale_by_sqrt |
bool
|
Whether to scale by sqrt(n), defaults to True. |
learnable_scale |
bool
|
Whether to use learnable diagonal scaling, defaults to True. |
MixedTransformAttentionConfig ¶
Bases: AttentionLayerConfig
Configuration for mixed transform attention layer.
Attributes:
| Name | Type | Description |
|---|---|---|
use_fft |
bool
|
Whether to use FFT transforms, defaults to True. |
use_dct |
bool
|
Whether to use DCT transforms, defaults to True. |
use_hadamard |
bool
|
Whether to use Hadamard transforms, defaults to True. |
SpectralKernelAttentionConfig ¶
Bases: AttentionLayerConfig
Configuration for spectral kernel attention.
Attributes:
| Name | Type | Description |
|---|---|---|
kernel_type |
SpectralKernelType
|
Type of spectral kernel ('gaussian', 'polynomial', 'spectral'), defaults to 'gaussian'. |
rank |
int | None
|
Rank for low-rank approximation, defaults to None (uses min(64, head_dim)). |
num_features |
int | None
|
Number of features for approximation, defaults to None. |