Skip to content

Attention Configuration

spectrans.config.layers.attention

Configuration models for attention layer components.

This module provides Pydantic models for validating and typing configuration data used to construct attention layers in spectrans.

Classes:

Name Description
SpectralAttentionConfig

Configuration for Spectral Attention layer with Random Fourier Features.

LSTAttentionConfig

Configuration for Linear Spectral Transform Attention layer.

DCTAttentionConfig

Configuration for DCT-based attention layer.

HadamardAttentionConfig

Configuration for Hadamard-based attention layer.

MixedTransformAttentionConfig

Configuration for mixed transform attention layer.

Notes

All configuration models use Pydantic v2 BaseModel for validation and type safety. Attention layer configurations inherit from AttentionLayerConfig in the parent core module.

Examples:

>>> from spectrans.config.layers.attention import SpectralAttentionConfig
>>> config = SpectralAttentionConfig(
...     hidden_dim=768,
...     num_heads=8,
...     num_features=256
... )
>>> print(config.kernel_type)
'softmax'

Classes

SpectralAttentionConfig

Bases: AttentionLayerConfig

Configuration for Spectral Attention with Random Fourier Features.

Attributes:

Name Type Description
num_features int | None

Number of random Fourier features, defaults to None (uses head_dim).

kernel_type KernelType

Type of kernel ('gaussian' or 'softmax'), defaults to 'softmax'.

use_orthogonal bool

Whether to use orthogonal random features, defaults to True.

feature_redraw bool

Whether to redraw features during training, defaults to False.

use_bias bool

Whether to use bias in projections, defaults to True.

LSTAttentionConfig

Bases: AttentionLayerConfig

Configuration for Linear Spectral Transform Attention.

Attributes:

Name Type Description
transform_type TransformLSTType

Type of spectral transform ('dct', 'dst', 'hadamard', 'mixed'), defaults to 'dct'.

learnable_scale bool

Whether to use learnable diagonal scaling, defaults to True.

normalize bool

Whether to normalize transform output, defaults to True.

use_bias bool

Whether to use bias in projections, defaults to True.

DCTAttentionConfig

Bases: AttentionLayerConfig

Configuration for DCT-based attention layer.

Attributes:

Name Type Description
dct_type int

Type of DCT transform (typically 2), defaults to 2.

learnable_scale bool

Whether to use learnable diagonal scaling, defaults to True.

HadamardAttentionConfig

Bases: AttentionLayerConfig

Configuration for Hadamard-based attention layer.

Attributes:

Name Type Description
scale_by_sqrt bool

Whether to scale by sqrt(n), defaults to True.

learnable_scale bool

Whether to use learnable diagonal scaling, defaults to True.

MixedTransformAttentionConfig

Bases: AttentionLayerConfig

Configuration for mixed transform attention layer.

Attributes:

Name Type Description
use_fft bool

Whether to use FFT transforms, defaults to True.

use_dct bool

Whether to use DCT transforms, defaults to True.

use_hadamard bool

Whether to use Hadamard transforms, defaults to True.

SpectralKernelAttentionConfig

Bases: AttentionLayerConfig

Configuration for spectral kernel attention.

Attributes:

Name Type Description
kernel_type SpectralKernelType

Type of spectral kernel ('gaussian', 'polynomial', 'spectral'), defaults to 'gaussian'.

rank int | None

Rank for low-rank approximation, defaults to None (uses min(64, head_dim)).

num_features int | None

Number of features for approximation, defaults to None.