Initialization Utilities¶
spectrans.utils.initialization ¶
Weight initialization utilities for spectral transformer components.
This module provides specialized initialization schemes tailored for spectral neural networks, including complex-valued parameters, frequency-domain aware initialization, and transform-specific initialization strategies. Proper initialization is crucial for spectral transformers due to their unique mathematical properties and parameter scaling requirements.
The initialization functions account for the specific characteristics of spectral transforms, including orthogonality constraints, complex number scaling, frequency domain properties, and stability requirements for gradient-based optimization.
Functions:
| Name | Description |
|---|---|
spectral_init |
General-purpose spectral parameter initialization. |
frequency_init |
Initialize parameters with frequency-domain properties. |
complex_xavier_init |
Xavier/Glorot initialization for complex-valued parameters. |
complex_kaiming_init |
Kaiming/He initialization for complex parameters. |
complex_normal_init |
Normal initialization for complex tensors. |
orthogonal_spectral_init |
Orthogonal initialization preserving spectral properties. |
xavier_spectral_init |
Xavier initialization adapted for spectral transforms. |
kaiming_spectral_init |
Kaiming initialization adapted for spectral transforms. |
dct_init |
Specialized initialization for DCT parameters. |
hadamard_init |
Initialization for Hadamard transform parameters. |
wavelet_init |
Initialize parameters for wavelet transforms. |
init_linear_spectral |
Initialize linear layers for spectral operations. |
init_conv_spectral |
Initialize convolutional layers for spectral operations. |
Examples:
Basic spectral initialization:
>>> import torch
>>> import torch.nn as nn
>>> from spectrans.utils.initialization import spectral_init, complex_xavier_init
>>> # Initialize a linear layer for spectral transforms
>>> linear = nn.Linear(512, 512)
>>> spectral_init(linear.weight, method='frequency', freq_range=(0.0, 0.5))
>>> spectral_init(linear.bias, method='zero')
Complex parameter initialization:
>>> # Initialize complex-valued parameters
>>> complex_weights = torch.empty(256, 256, dtype=torch.complex64)
>>> complex_xavier_init(complex_weights, gain=1.0)
>>>
>>> # Manual complex initialization
>>> real_part = torch.empty(256, 256)
>>> imag_part = torch.empty(256, 256)
>>> torch.nn.init.xavier_uniform_(real_part, gain=1.0/math.sqrt(2))
>>> torch.nn.init.xavier_uniform_(imag_part, gain=1.0/math.sqrt(2))
>>> complex_weights = torch.complex(real_part, imag_part)
Transform-specific initialization:
>>> from spectrans.utils.initialization import dct_init, hadamard_init, wavelet_init
>>> # DCT parameter initialization
>>> dct_params = torch.empty(512, 512)
>>> dct_init(dct_params, normalized=True)
>>>
>>> # Hadamard transform parameters
>>> hadamard_params = torch.empty(256, 256) # Must be power of 2
>>> hadamard_init(hadamard_params, normalized=True)
>>>
>>> # Wavelet parameters
>>> wavelet_params = torch.empty(1024, 1024)
>>> wavelet_init(wavelet_params, wavelet_type='db4', levels=3)
Layer initialization:
>>> from spectrans.utils.initialization import init_linear_spectral, init_conv_spectral
>>> # Initialize entire layers
>>> linear_layer = nn.Linear(768, 768)
>>> init_linear_spectral(linear_layer, method='xavier_spectral', transform_type='fourier')
>>>
>>> # Convolutional layer for spectral processing
>>> conv_layer = nn.Conv1d(512, 512, kernel_size=3)
>>> init_conv_spectral(conv_layer, method='kaiming_spectral', transform_type='dct')
Notes
Initialization Theory for Spectral Networks:
Complex Parameter Scaling: Complex parameters require careful scaling to maintain proper variance:
- Real and imaginary parts should be scaled by 1/√2 relative to real-valued case
- This maintains the same total variance while distributing it across both components
- Critical for stable training of complex neural networks
Frequency-Domain Considerations: Parameters operating in frequency domain have different scaling requirements:
- Low frequencies often have higher magnitude than high frequencies
- Initialization should reflect expected frequency content
- Different spectral transforms have different frequency characteristics
Orthogonal Transform Properties: Many spectral transforms are orthogonal/unitary and require special treatment:
- Parameters should preserve orthogonality during training
- Initial values should respect the mathematical structure
- Gradients may need special handling to maintain constraints
Mathematical Foundations:
Xavier/Glorot Initialization: For real-valued parameters: σ² = 2/(n_in + n_out) For complex-valued: σ² = 1/(n_in + n_out), split equally between real/imaginary
Kaiming/He Initialization: For ReLU activation: σ² = 2/n_in Complex variant: σ² = 1/n_in, split equally
Orthogonal Initialization: Creates matrices with orthonormal rows/columns using QR decomposition Essential for transforms requiring orthogonality constraints
Transform-Specific Considerations:
FFT Parameters:
- Complex-valued requiring careful magnitude/phase initialization
- Often benefit from frequency-aware initialization
- Should maintain Parseval's theorem properties
DCT/DST Parameters:
- Real-valued but with cosine/sine basis constraints
- Energy compaction properties should be preserved
- Orthogonality is crucial for proper reconstruction
Hadamard Parameters:
- Binary {-1, +1} structure should be respected
- Fast transform structure affects parameter scaling
- Power-of-2 constraints affect initialization patterns
Wavelet Parameters:
- Multi-resolution structure requires level-aware initialization
- Different wavelets have different scaling properties
- Perfect reconstruction constraints must be maintained
Implementation Details:
- Gradient Preservation: All initializations maintain gradient flow
- Device Handling: Automatically matches input tensor device and dtype
- Batch Operations: Efficient initialization for large parameter sets
- Memory Efficiency: In-place initialization where possible
- Numerical Stability: Careful handling of edge cases and extreme values
Common Patterns:
- Spectral Mixing Layers: Use frequency_init with appropriate frequency ranges
- Complex Attention: Use complex_xavier_init for query/key/value projections
- Transform Embeddings: Use transform-specific initialization (dct_init, etc.)
- Learnable Filters: Use orthogonal_spectral_init to maintain properties
- Residual Connections: Use xavier_spectral_init with proper gain scheduling
Performance Considerations:
- All initialization functions are vectorized and GPU-compatible
- Large parameter tensors are handled efficiently
- Memory usage is optimized for typical spectral network sizes
- Initialization time is minimized through optimized algorithms
See Also
spectrans.core.base : Base classes requiring proper initialization spectrans.transforms : Transform classes with specific initialization needs spectrans.utils.complex : Complex tensor operations for initialized parameters torch.nn.init : PyTorch's standard initialization functions
Functions¶
spectral_init ¶
Initialize tensor with spectral-aware method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Tensor to initialize. |
required |
mode
|
str
|
Initialization mode: "normal", "uniform", "xavier", "kaiming", "orthogonal". |
"normal"
|
gain
|
float
|
Scaling factor for initialization. |
1.0
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized tensor. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If mode is not supported or gain is not positive. |
RuntimeError
|
If tensor is not 2D for orthogonal initialization. |
Source code in spectrans/utils/initialization.py
xavier_spectral_init ¶
xavier_spectral_init(tensor: Tensor, gain: float = 1.0, distribution: Literal['normal', 'uniform'] = 'normal') -> Tensor
Xavier/Glorot initialization adapted for spectral transforms.
Maintains variance of activations and gradients across layers by scaling based on input and output dimensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Tensor to initialize. |
required |
gain
|
float
|
Scaling factor for initialization. |
1.0
|
distribution
|
(normal, uniform)
|
Distribution to use for initialization. |
"normal"
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized tensor. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If tensor has fewer than 2 dimensions, gain is not positive, or distribution is invalid. |
Source code in spectrans/utils/initialization.py
kaiming_spectral_init ¶
kaiming_spectral_init(tensor: Tensor, gain: float = 1.0, mode: Literal['fan_in', 'fan_out'] = 'fan_in', nonlinearity: str = 'relu') -> Tensor
Kaiming/He initialization adapted for spectral transforms.
Designed for networks with ReLU-like activations, maintaining variance through forward/backward passes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Tensor to initialize. |
required |
gain
|
float
|
Scaling factor for initialization. |
1.0
|
mode
|
(fan_in, fan_out)
|
Fan mode for variance calculation. |
"fan_in"
|
nonlinearity
|
str
|
Nonlinearity type for gain calculation. |
"relu"
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized tensor. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If tensor has fewer than 2 dimensions, parameters are invalid. |
Source code in spectrans/utils/initialization.py
orthogonal_spectral_init ¶
Orthogonal initialization for spectral transform matrices.
Creates orthogonal matrices that preserve norms, which is important for spectral transforms that should maintain energy conservation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
2D tensor to initialize. |
required |
gain
|
float
|
Scaling factor for the orthogonal matrix. |
1.0
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized orthogonal tensor. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If tensor is not 2D or gain is not positive. |
Source code in spectrans/utils/initialization.py
complex_normal_init ¶
Initialize complex tensor with complex normal distribution.
Both real and imaginary parts are initialized independently with normal distribution scaled to maintain proper variance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Complex tensor to initialize. |
required |
std
|
float
|
Standard deviation for each component. |
1.0
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized complex tensor. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If tensor is not complex. |
ValueError
|
If std is not positive. |
Source code in spectrans/utils/initialization.py
complex_xavier_init ¶
Xavier initialization for complex tensors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Complex tensor to initialize. |
required |
gain
|
float
|
Scaling factor for initialization. |
1.0
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized complex tensor. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If tensor is not complex. |
ValueError
|
If tensor dimensions or gain are invalid. |
Source code in spectrans/utils/initialization.py
complex_kaiming_init ¶
complex_kaiming_init(tensor: Tensor, gain: float = 1.0, mode: Literal['fan_in', 'fan_out'] = 'fan_in') -> Tensor
Kaiming initialization for complex tensors.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Complex tensor to initialize. |
required |
gain
|
float
|
Scaling factor for initialization. |
1.0
|
mode
|
(fan_in, fan_out)
|
Fan mode for variance calculation. |
"fan_in"
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized complex tensor. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If tensor is not complex. |
ValueError
|
If tensor dimensions or parameters are invalid. |
Source code in spectrans/utils/initialization.py
frequency_init ¶
Initialize tensor with frequency-domain aware values.
Initializes with small values at high frequencies and larger values at low frequencies, mimicking natural signal characteristics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Tensor to initialize (typically frequency domain parameters). |
required |
max_freq
|
float
|
Maximum frequency for scaling. |
1.0
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized tensor. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If max_freq is not positive. |
Source code in spectrans/utils/initialization.py
wavelet_init ¶
Initialize tensor with wavelet-like properties.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Tensor to initialize. |
required |
wavelet_type
|
str
|
Type of wavelet initialization. |
"db1"
|
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized tensor. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If wavelet_type is not supported. |
Source code in spectrans/utils/initialization.py
hadamard_init ¶
Initialize tensor with Hadamard matrix properties.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
Square tensor to initialize. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized tensor with Hadamard-like structure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If tensor is not square or not power-of-2 sized. |
Source code in spectrans/utils/initialization.py
dct_init ¶
Initialize tensor with DCT matrix properties.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tensor
|
Tensor
|
2D tensor to initialize. |
required |
Returns:
| Type | Description |
|---|---|
Tensor
|
Initialized tensor with DCT-like structure. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If tensor is not 2D. |
Source code in spectrans/utils/initialization.py
init_linear_spectral ¶
Initialize linear layer with spectral-aware method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
linear
|
Linear
|
Linear layer to initialize. |
required |
method
|
str
|
Initialization method: "xavier", "kaiming", "orthogonal". |
"xavier"
|
Returns:
| Type | Description |
|---|---|
Linear
|
Initialized linear layer. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If method is not supported. |
Source code in spectrans/utils/initialization.py
init_conv_spectral ¶
Initialize convolution layer with spectral-aware method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
conv
|
Conv1d | Conv2d
|
Convolution layer to initialize. |
required |
method
|
str
|
Initialization method: "xavier", "kaiming". |
"kaiming"
|
Returns:
| Type | Description |
|---|---|
Conv1d | Conv2d
|
Initialized convolution layer. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If method is not supported. |