Skip to content

Kernel Functions

spectrans.kernels

Kernel functions for spectral transformers.

This module provides kernel functions and feature maps used in spectral attention mechanisms and other kernel-based methods. It includes both explicit kernel evaluations and implicit representations through random feature maps.

The kernels approximate attention mechanisms with linear complexity through random feature expansions and spectral decompositions.

Modules:

Name Description
base

Base classes and interfaces for kernel functions.

rff

Random Fourier Features implementations.

spectral

Spectral kernel functions and decompositions.

Classes:

Name Description
CosineKernel

Cosine similarity kernel.

FourierKernel

Kernel defined in Fourier domain.

GaussianRFFKernel

Gaussian kernel with RFF approximation.

KernelFunction

Abstract base class for kernel functions.

KernelType

Type literal for kernel selection.

LaplacianRFFKernel

Laplacian kernel with RFF approximation.

LearnableSpectralKernel

Spectral kernel with learnable parameters.

OrthogonalRandomFeatures

Orthogonal variant of random features.

PolynomialKernel

Polynomial kernel implementation.

PolynomialSpectralKernel

Polynomial kernel with spectral decomposition.

RFFAttentionKernel

RFF designed for attention mechanisms.

RandomFeatureMap

Abstract base class for random feature approximations.

ShiftInvariantKernel

Base class for shift-invariant kernels.

SpectralKernel

Base class for spectral kernels.

TruncatedSVDKernel

Kernel approximation via truncated SVD.

Examples:

Using Gaussian RFF kernel:

>>> from spectrans.kernels import GaussianRFFKernel
>>> kernel = GaussianRFFKernel(input_dim=64, num_features=256, sigma=1.0)
>>> x = torch.randn(32, 100, 64)
>>> features = kernel(x)
>>> assert features.shape == (32, 100, 256)

Using learnable spectral kernel:

>>> from spectrans.kernels import LearnableSpectralKernel
>>> kernel = LearnableSpectralKernel(input_dim=64, rank=16)
>>> K = kernel.compute(x, x)
>>> assert K.shape == (32, 100, 100)
Notes

Kernel approximation achieves linear complexity attention mechanisms through random feature expansions and spectral decompositions. Random Fourier Features, based on Bochner's theorem, approximate shift-invariant kernels via the factorization \(k(\mathbf{x}, \mathbf{y}) \approx \varphi(\mathbf{x})^T \varphi(\mathbf{y})\) where \(\varphi\) maps inputs to a feature space.

Spectral decomposition methods leverage eigendecomposition for kernel computation through low-rank approximations, while orthogonal feature variants apply orthogonalized random projections to reduce approximation variance. The approximation error decreases with \(O(1/\sqrt{D})\) where \(D\) is the number of random features.

References

Ali Rahimi and Benjamin Recht. 2007. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20 (NeurIPS 2007), pages 1177-1184.

Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, and Adrian Weller. 2021. Rethinking attention with performers. In Proceedings of the International Conference on Learning Representations (ICLR).

See Also

spectrans.layers.attention : Attention layers using these kernels. spectrans.kernels.base : Base kernel interfaces. spectrans.kernels.rff : Random Fourier Features implementations.

Classes

CosineKernel

CosineKernel(eps: float = 1e-08)

Bases: KernelFunction

Cosine similarity kernel.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) =\) \(\frac{\langle \mathbf{x}, \mathbf{y} \rangle}{\|\mathbf{x}\| \|\mathbf{y}\|}\).

Parameters:

Name Type Description Default
eps float

Small value for numerical stability.

1e-8

Attributes:

Name Type Description
eps float

Numerical stability parameter.

Methods:

Name Description
compute

Compute cosine similarity kernel matrix.

Source code in spectrans/kernels/base.py
def __init__(self, eps: float = 1e-8):
    self.eps = eps
Functions
compute
compute(x: Tensor, y: Tensor) -> Tensor

Compute cosine similarity kernel matrix.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute cosine similarity kernel matrix.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    x_norm = torch.norm(x, dim=-1, keepdim=True)  # (..., n, 1)
    y_norm = torch.norm(y, dim=-1, keepdim=True)  # (..., m, 1)

    x_normalized = x / (x_norm + self.eps)
    y_normalized = y / (y_norm + self.eps)

    return torch.matmul(x_normalized, y_normalized.transpose(-2, -1))

KernelFunction

Bases: ABC

Abstract base class for kernel functions.

A kernel function \(k(\mathbf{x}, \mathbf{y})\) defines a similarity measure between inputs \(\mathbf{x}\) and \(\mathbf{y}\), satisfying positive semi-definiteness properties. This interface supports both explicit kernel evaluation and feature map representations.

Methods:

Name Description
compute

Compute kernel values between x and y.

gram_matrix

Compute Gram matrix \(K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)\).

is_positive_definite

Check if the kernel yields a positive definite Gram matrix.

Functions
compute abstractmethod
compute(x: Tensor, y: Tensor) -> Tensor

Compute kernel values between x and y.

Parameters:

Name Type Description Default
x Tensor

First input tensor of shape (..., n, d).

required
y Tensor

Second input tensor of shape (..., m, d).

required

Returns:

Type Description
Tensor

Kernel matrix of shape (..., n, m) where element \((i,j)\) contains \(k(\mathbf{x}_i, \mathbf{y}_j)\).

Source code in spectrans/kernels/base.py
@abstractmethod
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    r"""Compute kernel values between x and y.

    Parameters
    ----------
    x : Tensor
        First input tensor of shape (..., n, d).
    y : Tensor
        Second input tensor of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m) where element $(i,j)$
        contains $k(\mathbf{x}_i, \mathbf{y}_j)$.
    """
    pass
gram_matrix
gram_matrix(x: Tensor) -> Tensor

Compute Gram matrix \(K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)\).

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Gram matrix of shape (..., n, n).

Source code in spectrans/kernels/base.py
def gram_matrix(self, x: Tensor) -> Tensor:
    r"""Compute Gram matrix $K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)$.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Gram matrix of shape (..., n, n).
    """
    return self.compute(x, x)
is_positive_definite
is_positive_definite(x: Tensor, eps: float = 1e-06) -> bool

Check if the kernel yields a positive definite Gram matrix.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required
eps float

Tolerance for eigenvalue positivity check.

1e-6

Returns:

Type Description
bool

True if all eigenvalues of Gram matrix are > eps.

Source code in spectrans/kernels/base.py
def is_positive_definite(self, x: Tensor, eps: float = 1e-6) -> bool:
    """Check if the kernel yields a positive definite Gram matrix.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).
    eps : float, default=1e-6
        Tolerance for eigenvalue positivity check.

    Returns
    -------
    bool
        True if all eigenvalues of Gram matrix are > eps.
    """
    gram = self.gram_matrix(x)
    eigenvalues = torch.linalg.eigvalsh(gram)
    return bool(torch.all(eigenvalues > eps).item())

PolynomialKernel

PolynomialKernel(degree: int = 2, alpha: float = 1.0, coef0: float = 0.0)

Bases: KernelFunction

Polynomial kernel.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = (\alpha \langle \mathbf{x}, \mathbf{y} \rangle + c)^d\).

Parameters:

Name Type Description Default
degree int

Polynomial degree.

2
alpha float

Scaling of inner product.

1.0
coef0 float

Constant term.

0.0

Attributes:

Name Type Description
degree int

The polynomial degree.

alpha float

Inner product scaling.

coef0 float

Constant coefficient.

Methods:

Name Description
compute

Compute polynomial kernel matrix.

Source code in spectrans/kernels/base.py
def __init__(
    self,
    degree: int = 2,
    alpha: float = 1.0,
    coef0: float = 0.0,
):
    self.degree = degree
    self.alpha = alpha
    self.coef0 = coef0
Functions
compute
compute(x: Tensor, y: Tensor) -> Tensor

Compute polynomial kernel matrix.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute polynomial kernel matrix.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    inner_product = torch.matmul(x, y.transpose(-2, -1))
    return (self.alpha * inner_product + self.coef0) ** self.degree

RandomFeatureMap

RandomFeatureMap(input_dim: int, num_features: int, kernel_scale: float = 1.0, seed: int | None = None)

Bases: Module, ABC

Abstract base class for random feature map approximations.

Random feature maps provide finite-dimensional approximations to kernel functions through the mapping:

.. math:: k(\mathbf{x}, \mathbf{y}) \approx \varphi(\mathbf{x})^T \varphi(\mathbf{y})

This enables linear-time computation of kernel operations.

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors.

required
num_features int

Number of random features (D).

required
kernel_scale float

Scaling parameter for the kernel.

1.0
seed int | None

Random seed for reproducibility.

None

Attributes:

Name Type Description
input_dim int

Input dimension.

num_features int

Number of random features.

kernel_scale float

Kernel scaling parameter.

Methods:

Name Description
forward

Apply feature map to input.

kernel_approximation

Approximate kernel matrix using feature maps.

Source code in spectrans/kernels/base.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_scale: float = 1.0,
    seed: int | None = None,
):
    super().__init__()
    self.input_dim = input_dim
    self.num_features = num_features
    self.kernel_scale = kernel_scale

    if seed is not None:
        torch.manual_seed(seed)
Functions
forward abstractmethod
forward(x: Tensor) -> Tensor

Apply feature map to input.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Feature mapped tensor of shape (..., n, D) where D is the number of random features.

Source code in spectrans/kernels/base.py
@abstractmethod
def forward(self, x: Tensor) -> Tensor:
    """Apply feature map to input.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D) where D
        is the number of random features.
    """
    pass
kernel_approximation
kernel_approximation(x: Tensor, y: Tensor) -> Tensor

Approximate kernel matrix using feature maps.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Approximated kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py
def kernel_approximation(self, x: Tensor, y: Tensor) -> Tensor:
    """Approximate kernel matrix using feature maps.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Approximated kernel matrix of shape (..., n, m).
    """
    phi_x = self.forward(x)  # (..., n, D)
    phi_y = self.forward(y)  # (..., m, D)
    return torch.matmul(phi_x, phi_y.transpose(-2, -1))

ShiftInvariantKernel

ShiftInvariantKernel(bandwidth: float = 1.0)

Bases: KernelFunction

Base class for shift-invariant (stationary) kernels.

Shift-invariant kernels depend only on the difference \(\mathbf{x} - \mathbf{y}\), i.e., \(k(\mathbf{x}, \mathbf{y}) = k(\mathbf{x} - \mathbf{y}, \mathbf{0})\) \(= \kappa(\mathbf{x} - \mathbf{y})\) for some function \(\kappa\).

These kernels admit Random Fourier Features approximation via Bochner's theorem.

Parameters:

Name Type Description Default
bandwidth float

Kernel bandwidth parameter (inverse of length scale).

1.0

Attributes:

Name Type Description
bandwidth float

The bandwidth parameter.

Methods:

Name Description
evaluate_difference

Evaluate kernel on difference vectors.

compute

Compute kernel matrix for shift-invariant kernel.

spectral_density

Fourier transform of the kernel (spectral density).

Source code in spectrans/kernels/base.py
def __init__(self, bandwidth: float = 1.0):
    self.bandwidth = bandwidth
Functions
evaluate_difference abstractmethod
evaluate_difference(diff: Tensor) -> Tensor

Evaluate kernel on difference vectors.

Parameters:

Name Type Description Default
diff Tensor

Difference vectors \(\mathbf{x} - \mathbf{y}\) of shape (..., d).

required

Returns:

Type Description
Tensor

Kernel values \(\kappa(\text{diff})\) of shape (...).

Source code in spectrans/kernels/base.py
@abstractmethod
def evaluate_difference(self, diff: Tensor) -> Tensor:
    r"""Evaluate kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors $\mathbf{x} - \mathbf{y}$ of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values $\kappa(\text{diff})$ of shape (...).
    """
    pass
compute
compute(x: Tensor, y: Tensor) -> Tensor

Compute kernel matrix for shift-invariant kernel.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute kernel matrix for shift-invariant kernel.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    # Compute pairwise differences
    x_expanded = x.unsqueeze(-2)  # (..., n, 1, d)
    y_expanded = y.unsqueeze(-3)  # (..., 1, m, d)
    diff = x_expanded - y_expanded  # (..., n, m, d)

    # Evaluate kernel on differences
    return self.evaluate_difference(diff)
spectral_density abstractmethod
spectral_density(omega: Tensor) -> Tensor

Fourier transform of the kernel (spectral density).

For shift-invariant kernels, this defines the sampling distribution for Random Fourier Features.

Parameters:

Name Type Description Default
omega Tensor

Frequency vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Spectral density values of shape (...).

Source code in spectrans/kernels/base.py
@abstractmethod
def spectral_density(self, omega: Tensor) -> Tensor:
    """Fourier transform of the kernel (spectral density).

    For shift-invariant kernels, this defines the sampling
    distribution for Random Fourier Features.

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    pass

GaussianRFFKernel

GaussianRFFKernel(input_dim: int, num_features: int, sigma: float = 1.0, use_cos_sin: bool = False, orthogonal: bool = False, trainable: bool = False, seed: int | None = None)

Bases: ShiftInvariantKernel, RandomFeatureMap

Gaussian (RBF) kernel with Random Fourier Features approximation.

Implements the Gaussian kernel using RFF.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{y}\|^2}{2\sigma^2}\right)\).

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors.

required
num_features int

Number of random Fourier features.

required
sigma float

Kernel bandwidth (standard deviation).

1.0
use_cos_sin bool

If True, use both cos and sin features (doubles feature dimension).

False
orthogonal bool

If True, use orthogonal random features.

False
trainable bool

If True, make random parameters trainable.

False
seed int | None

Random seed for reproducibility.

None

Attributes:

Name Type Description
omega Parameter or Tensor

Random frequencies of shape (input_dim, num_features).

bias Parameter or Tensor

Random phase shifts of shape (num_features,).

Methods:

Name Description
forward

Apply random Fourier feature map.

evaluate_difference

Evaluate Gaussian kernel on difference vectors.

spectral_density

Spectral density for Gaussian kernel (Gaussian distribution).

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    sigma: float = 1.0,
    use_cos_sin: bool = False,
    orthogonal: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    ShiftInvariantKernel.__init__(self, bandwidth=1.0 / sigma)
    RandomFeatureMap.__init__(self, input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.sigma = sigma
    self.use_cos_sin = use_cos_sin
    self.orthogonal = orthogonal
    self.trainable = trainable

    # Effective number of output features
    self.output_features = num_features * 2 if use_cos_sin else num_features

    # Initialize random parameters
    self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply random Fourier feature map.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Feature mapped tensor of shape (..., n, D) where D is self.output_features.

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply random Fourier feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D) where D is
        self.output_features.
    """
    # Linear projection: (..., n, d) @ (d, m) -> (..., n, m)
    projection = torch.matmul(x, self.omega)

    # Add phase shifts
    projection = projection + self.bias

    if self.use_cos_sin:
        # Use both cos and sin features
        cos_features = torch.cos(projection)
        sin_features = torch.sin(projection)
        features = torch.cat([cos_features, sin_features], dim=-1)
        # Normalization factor for cos+sin
        scale = math.sqrt(1.0 / self.num_features)
    else:
        # Use only cos features
        features = torch.cos(projection)
        # Normalization factor for cos only
        scale = math.sqrt(2.0 / self.num_features)

    return features * scale
evaluate_difference
evaluate_difference(diff: Tensor) -> Tensor

Evaluate Gaussian kernel on difference vectors.

Parameters:

Name Type Description Default
diff Tensor

Difference vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Kernel values of shape (...).

Source code in spectrans/kernels/rff.py
def evaluate_difference(self, diff: Tensor) -> Tensor:
    """Evaluate Gaussian kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values of shape (...).
    """
    squared_norm = torch.sum(diff**2, dim=-1)
    return torch.exp(-squared_norm / (2 * self.sigma**2))
spectral_density
spectral_density(omega: Tensor) -> Tensor

Spectral density for Gaussian kernel (Gaussian distribution).

Parameters:

Name Type Description Default
omega Tensor

Frequency vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Spectral density values of shape (...).

Source code in spectrans/kernels/rff.py
def spectral_density(self, omega: Tensor) -> Tensor:
    """Spectral density for Gaussian kernel (Gaussian distribution).

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    d = omega.shape[-1]
    norm_squared = torch.sum(omega**2, dim=-1)
    # Gaussian spectral density
    result: Tensor = (2 * math.pi * self.sigma**2) ** (d / 2) * torch.exp(
        -0.5 * self.sigma**2 * norm_squared
    )
    return result

LaplacianRFFKernel

LaplacianRFFKernel(input_dim: int, num_features: int, sigma: float = 1.0, use_cos_sin: bool = False, trainable: bool = False, seed: int | None = None)

Bases: ShiftInvariantKernel, RandomFeatureMap

Laplacian kernel with Random Fourier Features approximation.

Implements the Laplacian kernel using RFF with Cauchy distribution.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{y}\|_1}{\sigma}\right)\).

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors.

required
num_features int

Number of random Fourier features.

required
sigma float

Kernel bandwidth parameter.

1.0
use_cos_sin bool

If True, use both cos and sin features.

False
trainable bool

If True, make random parameters trainable.

False
seed int | None

Random seed for reproducibility.

None

Methods:

Name Description
forward

Apply random Fourier feature map.

evaluate_difference

Evaluate Laplacian kernel on difference vectors.

spectral_density

Spectral density for Laplacian kernel (Cauchy distribution).

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    sigma: float = 1.0,
    use_cos_sin: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    ShiftInvariantKernel.__init__(self, bandwidth=1.0 / sigma)
    RandomFeatureMap.__init__(self, input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.sigma = sigma
    self.use_cos_sin = use_cos_sin
    self.trainable = trainable

    self.output_features = num_features * 2 if use_cos_sin else num_features

    self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply random Fourier feature map.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply random Fourier feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D).
    """
    projection = torch.matmul(x, self.omega) + self.bias

    if self.use_cos_sin:
        cos_features = torch.cos(projection)
        sin_features = torch.sin(projection)
        features = torch.cat([cos_features, sin_features], dim=-1)
        scale = math.sqrt(1.0 / self.num_features)
    else:
        features = torch.cos(projection)
        scale = math.sqrt(2.0 / self.num_features)

    return features * scale
evaluate_difference
evaluate_difference(diff: Tensor) -> Tensor

Evaluate Laplacian kernel on difference vectors.

Parameters:

Name Type Description Default
diff Tensor

Difference vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Kernel values of shape (...).

Source code in spectrans/kernels/rff.py
def evaluate_difference(self, diff: Tensor) -> Tensor:
    """Evaluate Laplacian kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values of shape (...).
    """
    l1_norm = torch.sum(torch.abs(diff), dim=-1)
    return torch.exp(-l1_norm / self.sigma)
spectral_density
spectral_density(omega: Tensor) -> Tensor

Spectral density for Laplacian kernel (Cauchy distribution).

Parameters:

Name Type Description Default
omega Tensor

Frequency vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Spectral density values of shape (...).

Source code in spectrans/kernels/rff.py
def spectral_density(self, omega: Tensor) -> Tensor:
    """Spectral density for Laplacian kernel (Cauchy distribution).

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    d = omega.shape[-1]
    # Product of 1D Cauchy densities
    density = torch.ones_like(omega[..., 0])
    for i in range(d):
        density = density * (
            2 * self.sigma / (math.pi * (1 + (self.sigma * omega[..., i]) ** 2))
        )
    return density

OrthogonalRandomFeatures

OrthogonalRandomFeatures(input_dim: int, num_features: int, kernel_type: Literal['gaussian', 'laplacian'] = 'gaussian', sigma: float = 1.0, use_hadamard: bool = False, trainable: bool = False, seed: int | None = None)

Bases: RandomFeatureMap

Orthogonal Random Features for kernel approximation.

Uses structured orthogonal matrices to reduce approximation variance compared to standard i.i.d. Gaussian features.

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors.

required
num_features int

Number of random features.

required
kernel_type Literal['gaussian', 'laplacian']

Type of kernel to approximate.

"gaussian"
sigma float

Kernel bandwidth parameter.

1.0
use_hadamard bool

If True, use fast Hadamard transform.

False
trainable bool

If True, make scaling parameters trainable.

False
seed int | None

Random seed.

None

Methods:

Name Description
forward

Apply orthogonal random feature map.

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_type: Literal["gaussian", "laplacian"] = "gaussian",
    sigma: float = 1.0,
    use_hadamard: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    super().__init__(input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.kernel_type = kernel_type
    self.sigma = sigma
    self.use_hadamard = use_hadamard
    self.trainable = trainable

    self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply orthogonal random feature map.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply orthogonal random feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D).
    """
    if self.use_hadamard:
        # Pad input if necessary
        if x.shape[-1] < self.d_padded:
            padding = self.d_padded - x.shape[-1]
            x = F.pad(x, (0, padding))

        # Apply HD HD HD structure
        z = x
        for i in range(3):
            if hasattr(self, "diagonals"):
                diag = self.diagonals[i]
            else:
                diag = getattr(self, f"diagonal_{i}")
            z = z * diag
            z = self._hadamard_transform(z)

        # Truncate to desired number of features
        projection = z[..., : self.num_features]
    else:
        projection = torch.matmul(x, self.projection)

    # Add bias and apply cosine
    projection = projection + self.bias
    features = torch.cos(projection)

    # Normalize
    scale = math.sqrt(2.0 / self.num_features)
    return features * scale

RFFAttentionKernel

RFFAttentionKernel(input_dim: int, num_features: int, kernel_type: Literal['softmax', 'relu', 'elu'] = 'softmax', use_orthogonal: bool = True, redraw: bool = False, seed: int | None = None)

Bases: RandomFeatureMap

Random Fourier Features specifically designed for attention mechanisms.

Implements positive random features for use in linear attention, following the Performer architecture.

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors (typically head_dim).

required
num_features int

Number of random features.

required
kernel_type Literal['softmax', 'relu', 'elu']

Type of kernel approximation.

"softmax"
use_orthogonal bool

If True, use orthogonal random features.

True
redraw bool

If True, redraw random features at each forward pass.

False
seed int | None

Random seed.

None

Methods:

Name Description
forward

Apply random feature map for attention.

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_type: Literal["softmax", "relu", "elu"] = "softmax",
    use_orthogonal: bool = True,
    redraw: bool = False,
    seed: int | None = None,
):
    super().__init__(input_dim, num_features, seed=seed)

    self.kernel_type = kernel_type
    self.use_orthogonal = use_orthogonal
    self.redraw = redraw

    if not redraw:
        self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply random feature map for attention.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Positive feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply random feature map for attention.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Positive feature mapped tensor of shape (..., n, D).
    """
    if self.redraw:
        # Redraw random features (useful for training)
        projection = (
            self._sample_orthogonal_gaussian()
            if self.use_orthogonal
            else torch.randn(self.input_dim, self.num_features, device=x.device)
        )
        projection = projection / math.sqrt(self.input_dim)
    else:
        projection = self.projection

    # Linear projection
    z = torch.matmul(x, projection)

    if self.kernel_type == "softmax":
        # Positive features for softmax kernel approximation
        # $\varphi(\mathbf{x}) = \exp(\mathbf{x}^T \omega - \|\mathbf{x}\|^2/2) / \sqrt{m}$
        x_norm_sq = torch.sum(x**2, dim=-1, keepdim=True) / 2
        features = torch.exp(z - x_norm_sq)
        scale = 1.0 / math.sqrt(self.num_features)

    elif self.kernel_type == "relu":
        # ReLU kernel: $\max(0, \mathbf{x}^T \omega)$
        features = F.relu(z)
        scale = math.sqrt(2.0 / self.num_features)

    else:  # elu
        # ELU kernel for smooth approximation
        features = F.elu(z) + 1
        scale = 1.0 / math.sqrt(self.num_features)

    return features * scale

FourierKernel

FourierKernel(rank: int, input_dim: int, learnable_filter: bool = True, filter_type: Literal['gaussian', 'butterworth', 'ideal'] = 'gaussian', cutoff_freq: float = 0.5)

Bases: Module, SpectralKernel

Kernel defined in Fourier domain.

Defines kernel through spectral filters in frequency space.

Parameters:

Name Type Description Default
rank int

Number of Fourier modes.

required
input_dim int

Input dimension.

required
learnable_filter bool

Whether filter is learnable.

True
filter_type Literal['gaussian', 'butterworth', 'ideal']

Type of spectral filter.

"gaussian"
cutoff_freq float

Normalized cutoff frequency.

0.5

Attributes:

Name Type Description
filter Parameter or Tensor

Spectral filter of shape (rank,).

Methods:

Name Description
compute

Compute Fourier kernel.

Source code in spectrans/kernels/spectral.py
def __init__(
    self,
    rank: int,
    input_dim: int,
    learnable_filter: bool = True,
    filter_type: Literal["gaussian", "butterworth", "ideal"] = "gaussian",
    cutoff_freq: float = 0.5,
):
    # Use super() to initialize nn.Module (first in MRO)
    super().__init__()
    # Manually set attributes that SpectralKernel.__init__ would set
    self.rank = rank
    self.normalize = True

    self.input_dim = input_dim
    self.filter_type = filter_type
    self.cutoff_freq = cutoff_freq

    # Initialize spectral filter
    filter_vals = self._init_filter()

    if learnable_filter:
        self.filter = nn.Parameter(filter_vals)
    else:
        self.register_buffer("filter", filter_vals)
Functions
compute
compute(x: Tensor, y: Tensor) -> Tensor

Compute Fourier kernel.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/spectral.py
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute Fourier kernel.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    # Compute FFT of inputs
    x_freq = safe_rfft(x, dim=-1)
    y_freq = safe_rfft(y, dim=-1)

    # Truncate to rank modes
    x_freq = x_freq[..., : self.rank]
    y_freq = y_freq[..., : self.rank]

    # Apply spectral filter
    x_filtered = x_freq * self.filter
    y_filtered = y_freq * self.filter

    # Compute kernel in frequency domain
    # K(x,y) = Real(IFFT(X_filtered * conj(Y_filtered)))
    kernel_freq = x_filtered.unsqueeze(-2) * y_filtered.unsqueeze(-3).conj()

    # Average over frequency dimension
    kernel: Tensor = kernel_freq.real.mean(dim=-1)

    return kernel

LearnableSpectralKernel

LearnableSpectralKernel(input_dim: int, rank: int, init_scale: float = 1.0, trainable_eigenvectors: bool = True, normalize: bool = True)

Bases: Module, SpectralKernel

Spectral kernel with learnable eigenvalues and eigenfunctions.

Parameters:

Name Type Description Default
input_dim int

Input dimension.

required
rank int

Number of spectral components.

required
init_scale float

Initialization scale.

1.0
trainable_eigenvectors bool

Whether eigenvectors are trainable.

True
normalize bool

Whether to normalize.

True

Attributes:

Name Type Description
eigenvectors Parameter

Learnable eigenvectors of shape (input_dim, rank).

eigenvalues Parameter

Learnable eigenvalues of shape (rank,).

Methods:

Name Description
compute

Compute learnable spectral kernel.

extract_features

Extract spectral features.

forward

Forward pass for nn.Module compatibility.

orthogonalize_eigenvectors

Orthogonalize eigenvectors via Gram-Schmidt.

Source code in spectrans/kernels/spectral.py
def __init__(
    self,
    input_dim: int,
    rank: int,
    init_scale: float = 1.0,
    trainable_eigenvectors: bool = True,
    normalize: bool = True,
):
    nn.Module.__init__(self)
    SpectralKernel.__init__(self, rank, normalize)

    self.input_dim = input_dim
    self.trainable_eigenvectors = trainable_eigenvectors

    # Initialize eigenvectors (orthogonal)
    eigenvectors = torch.randn(input_dim, rank) * init_scale
    eigenvectors, _ = torch.linalg.qr(eigenvectors)

    if trainable_eigenvectors:
        self.eigenvectors = nn.Parameter(eigenvectors)
    else:
        self.register_buffer("eigenvectors", eigenvectors)

    # Initialize eigenvalues (positive, decreasing)
    eigenvalues = torch.linspace(1.0, 0.1, rank) * init_scale
    self.eigenvalues = nn.Parameter(eigenvalues)
Functions
compute
compute(x: Tensor, y: Tensor) -> Tensor

Compute learnable spectral kernel.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/spectral.py
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute learnable spectral kernel.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    # Project to eigenspace
    x_proj = torch.matmul(x, self.eigenvectors)  # (..., n, r)
    y_proj = torch.matmul(y, self.eigenvectors)  # (..., m, r)

    # Apply eigenvalue weighting
    x_weighted = x_proj * torch.sqrt(torch.abs(self.eigenvalues) + 1e-8)
    y_weighted = y_proj * torch.sqrt(torch.abs(self.eigenvalues) + 1e-8)

    # Compute kernel
    kernel = torch.matmul(x_weighted, y_weighted.transpose(-2, -1))

    if self.normalize:
        # Row normalization
        kernel = F.normalize(kernel, p=2, dim=-1)

    return kernel
extract_features
extract_features(x: Tensor) -> Tensor

Extract spectral features.

Parameters:

Name Type Description Default
x Tensor

Input of shape (..., n, d).

required

Returns:

Type Description
Tensor

Spectral features of shape (..., n, r).

Source code in spectrans/kernels/spectral.py
def extract_features(self, x: Tensor) -> Tensor:
    """Extract spectral features.

    Parameters
    ----------
    x : Tensor
        Input of shape (..., n, d).

    Returns
    -------
    Tensor
        Spectral features of shape (..., n, r).
    """
    # Project to eigenspace
    features = torch.matmul(x, self.eigenvectors)

    # Weight by eigenvalues
    features = features * torch.sqrt(torch.abs(self.eigenvalues) + 1e-8)

    return features
forward
forward(x: Tensor, y: Tensor | None = None) -> Tensor

Forward pass for nn.Module compatibility.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor | None

Second input. If None, returns features.

None

Returns:

Type Description
Tensor

Kernel matrix or features.

Source code in spectrans/kernels/spectral.py
def forward(self, x: Tensor, y: Tensor | None = None) -> Tensor:
    """Forward pass for nn.Module compatibility.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor | None, default=None
        Second input. If None, returns features.

    Returns
    -------
    Tensor
        Kernel matrix or features.
    """
    if y is None:
        return self.extract_features(x)
    else:
        return self.compute(x, y)
orthogonalize_eigenvectors
orthogonalize_eigenvectors() -> None

Orthogonalize eigenvectors via Gram-Schmidt.

Source code in spectrans/kernels/spectral.py
def orthogonalize_eigenvectors(self) -> None:
    """Orthogonalize eigenvectors via Gram-Schmidt."""
    if self.trainable_eigenvectors:
        with torch.no_grad():
            Q, _ = torch.linalg.qr(self.eigenvectors)
            self.eigenvectors.data = Q

PolynomialSpectralKernel

PolynomialSpectralKernel(rank: int, degree: int = 2, coef0: float = 1.0, alpha: float = 1.0, normalize: bool = True)

Bases: SpectralKernel

Polynomial kernel with spectral decomposition.

Computes \((\mathbf{X}\mathbf{Y}^T + c)^d\) using eigendecomposition.

Parameters:

Name Type Description Default
rank int

Rank of spectral approximation.

required
degree int

Polynomial degree.

2
coef0 float

Constant coefficient.

1.0
alpha float

Scaling factor.

1.0
normalize bool

Whether to normalize.

True

Attributes:

Name Type Description
degree int

Polynomial degree.

coef0 float

Constant term.

alpha float

Scale factor.

Methods:

Name Description
compute

Compute polynomial spectral kernel.

compute_attention

Compute attention weights using spectral decomposition.

Source code in spectrans/kernels/spectral.py
def __init__(
    self,
    rank: int,
    degree: int = 2,
    coef0: float = 1.0,
    alpha: float = 1.0,
    normalize: bool = True,
):
    super().__init__(rank, normalize)
    self.degree = degree
    self.coef0 = coef0
    self.alpha = alpha
Functions
compute
compute(x: Tensor, y: Tensor) -> Tensor

Compute polynomial spectral kernel.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/spectral.py
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute polynomial spectral kernel.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    # Standard polynomial kernel
    inner = torch.matmul(x, y.transpose(-2, -1))
    kernel = (self.alpha * inner + self.coef0) ** self.degree

    if self.normalize:
        # Normalize by geometric mean of norms
        x_norm = torch.norm(x, dim=-1, keepdim=True)
        y_norm = torch.norm(y, dim=-1, keepdim=True)
        norm_matrix = torch.matmul(x_norm, y_norm.transpose(-2, -1))
        kernel = kernel / (norm_matrix + 1e-8)

    return kernel
compute_attention
compute_attention(q: Tensor, k: Tensor) -> Tensor

Compute attention weights using spectral decomposition.

Parameters:

Name Type Description Default
q Tensor

Queries of shape (..., n, d).

required
k Tensor

Keys of shape (..., m, d).

required

Returns:

Type Description
Tensor

Attention weights of shape (..., n, m).

Source code in spectrans/kernels/spectral.py
def compute_attention(self, q: Tensor, k: Tensor) -> Tensor:
    """Compute attention weights using spectral decomposition.

    Parameters
    ----------
    q : Tensor
        Queries of shape (..., n, d).
    k : Tensor
        Keys of shape (..., m, d).

    Returns
    -------
    Tensor
        Attention weights of shape (..., n, m).
    """
    # Low-rank approximation via SVD
    # Q = U_q S_q V_q^T, K = U_k S_k V_k^T

    # Compute QK^T approximately
    q_reduced = self._reduce_rank(q)  # (..., n, r)
    k_reduced = self._reduce_rank(k)  # (..., m, r)

    # Polynomial kernel in reduced space
    inner = torch.matmul(q_reduced, k_reduced.transpose(-2, -1))
    attention = (self.alpha * inner + self.coef0) ** self.degree

    if self.normalize:
        attention = F.softmax(attention, dim=-1)

    return attention

SpectralKernel

SpectralKernel(rank: int, normalize: bool = True)

Bases: KernelFunction

Base class for spectral kernel functions.

Spectral kernels use eigendecomposition or spectral analysis for efficient kernel computation.

Parameters:

Name Type Description Default
rank int

Rank of spectral approximation.

required
normalize bool

Whether to normalize kernel values.

True

Attributes:

Name Type Description
rank int

Approximation rank.

normalize bool

Normalization flag.

Methods:

Name Description
spectral_decomposition

Compute spectral decomposition of input.

Source code in spectrans/kernels/spectral.py
def __init__(self, rank: int, normalize: bool = True):
    self.rank = rank
    self.normalize = normalize
Functions
spectral_decomposition
spectral_decomposition(x: Tensor) -> tuple[Tensor, Tensor]

Compute spectral decomposition of input.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Name Type Description
eigenvectors Tensor

Eigenvectors of shape (..., n, rank).

eigenvalues Tensor

Eigenvalues of shape (..., rank).

Source code in spectrans/kernels/spectral.py
def spectral_decomposition(self, x: Tensor) -> tuple[Tensor, Tensor]:
    """Compute spectral decomposition of input.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    eigenvectors : Tensor
        Eigenvectors of shape (..., n, rank).
    eigenvalues : Tensor
        Eigenvalues of shape (..., rank).
    """
    # Compute Gram matrix
    gram = torch.matmul(x, x.transpose(-2, -1))

    # Eigendecomposition
    eigenvalues, eigenvectors = torch.linalg.eigh(gram)

    # Keep top-k eigenvalues/vectors
    eigenvalues = eigenvalues[..., -self.rank :]
    eigenvectors = eigenvectors[..., -self.rank :]

    if self.normalize:
        # Normalize by trace
        trace = eigenvalues.sum(dim=-1, keepdim=True)
        eigenvalues = eigenvalues / (trace + 1e-8)

    return eigenvectors, eigenvalues

TruncatedSVDKernel

TruncatedSVDKernel(rank: int, normalize: bool = True, use_randomized: bool = False)

Bases: SpectralKernel

Kernel approximation via truncated SVD.

Uses SVD to compute low-rank approximation of kernel matrix.

Parameters:

Name Type Description Default
rank int

Truncation rank.

required
normalize bool

Whether to normalize.

True
use_randomized bool

Use randomized SVD for large matrices.

False

Attributes:

Name Type Description
use_randomized bool

Whether to use randomized algorithms.

Methods:

Name Description
compute

Compute kernel via truncated SVD.

Source code in spectrans/kernels/spectral.py
def __init__(
    self,
    rank: int,
    normalize: bool = True,
    use_randomized: bool = False,
):
    super().__init__(rank, normalize)
    self.use_randomized = use_randomized
Functions
compute
compute(x: Tensor, y: Tensor) -> Tensor

Compute kernel via truncated SVD.

Parameters:

Name Type Description Default
x Tensor

First input of shape (..., n, d).

required
y Tensor

Second input of shape (..., m, d).

required

Returns:

Type Description
Tensor

Approximate kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/spectral.py
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute kernel via truncated SVD.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Approximate kernel matrix of shape (..., n, m).
    """
    # Compute full kernel matrix
    kernel_full = torch.matmul(x, y.transpose(-2, -1))

    if self.use_randomized:
        # Randomized SVD (faster for large matrices)
        kernel_approx = self._randomized_svd_approximation(kernel_full)
    else:
        # Standard SVD
        U, S, Vt = torch.linalg.svd(kernel_full, full_matrices=False)

        # Truncate to rank
        U_r = U[..., : self.rank]
        S_r = S[..., : self.rank]
        Vt_r = Vt[..., : self.rank, :]

        # Reconstruct
        kernel_approx = torch.matmul(U_r * S_r.unsqueeze(-2), Vt_r)

    if self.normalize:
        # Normalize rows
        row_norms = kernel_approx.norm(dim=-1, keepdim=True)
        kernel_approx = kernel_approx / (row_norms + 1e-8)

    return kernel_approx