Base Kernel Classes¶

spectrans.kernels.base ¶

Base interfaces and classes for kernel functions and random feature maps.

This module defines the abstract base classes for kernel functions used in spectral attention mechanisms and other kernel-based methods. It provides interfaces for both explicit kernel evaluations and implicit feature map representations through random features.

The kernel framework supports various approximation techniques including Random Fourier Features (RFF), polynomial kernels, and spectral kernels, enabling computation of attention mechanisms with linear complexity.

Classes:

Name	Description
`KernelFunction`	Abstract base class for kernel functions \(k(\mathbf{x}, \mathbf{y})\).
`RandomFeatureMap`	Abstract base class for random feature approximations.
`ShiftInvariantKernel`	Base class for shift-invariant (stationary) kernels.

Examples:

Implementing a custom kernel:

>>> import torch
>>> from spectrans.kernels.base import KernelFunction
>>> class LinearKernel(KernelFunction):
...     def compute(self, x, y):
...         return torch.matmul(x, y.transpose(-2, -1))

Using a random feature map:

>>> from spectrans.kernels.base import RandomFeatureMap
>>> class CustomFeatureMap(RandomFeatureMap):
...     def __init__(self, input_dim, num_features):
...         super().__init__(input_dim, num_features)
...         # Initialize random parameters
...     def forward(self, x):
...         # Return feature mapped tensor
...         pass

Notes

For shift-invariant kernels, Bochner's theorem states that any positive definite shift-invariant kernel can be represented as the Fourier transform of a non-negative measure:

\[ k(\mathbf{x} - \mathbf{y}) = \int p(\omega) \exp(i\omega^T(\mathbf{x}-\mathbf{y})) d\omega \]

This representation enables Random Fourier Features approximation through Monte Carlo sampling, where the kernel is approximated by the inner product of explicit feature maps \(k(\mathbf{x}, \mathbf{y}) \approx \varphi(\mathbf{x})^T \varphi(\mathbf{y})\). The feature map takes the form \(\varphi(\mathbf{x}) = \sqrt{\frac{2}{D}} [\cos(\omega_1^T\mathbf{x} + b_1), \ldots, \cos(\omega_D^T\mathbf{x} + b_D)]\) where \(\omega_i\) are sampled from \(p(\omega)\) and \(b_i\) from \(\text{Uniform}[0, 2\pi]\).

The approximation error decreases with \(O(1/\sqrt{D})\) where \(D\) is the number of random features.

References

Ali Rahimi and Benjamin Recht. 2007. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20 (NeurIPS 2007), pages 1177-1184.

Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, and Adrian Weller. 2021. Rethinking attention with performers. In Proceedings of the International Conference on Learning Representations (ICLR).

Classes¶

KernelFunction ¶

Bases: ABC

Abstract base class for kernel functions.

A kernel function \(k(\mathbf{x}, \mathbf{y})\) defines a similarity measure between inputs \(\mathbf{x}\) and \(\mathbf{y}\), satisfying positive semi-definiteness properties. This interface supports both explicit kernel evaluation and feature map representations.

Methods:

Name	Description
`compute`	Compute kernel values between x and y.
`gram_matrix`	Compute Gram matrix \(K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)\).
`is_positive_definite`	Check if the kernel yields a positive definite Gram matrix.

Functions¶

compute `abstractmethod` ¶

compute(x: Tensor, y: Tensor) -> Tensor

Compute kernel values between x and y.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	First input tensor of shape (..., n, d).	required
`y`	`Tensor`	Second input tensor of shape (..., m, d).	required

Returns:

Type	Description
`Tensor`	Kernel matrix of shape (..., n, m) where element \((i,j)\) contains \(k(\mathbf{x}_i, \mathbf{y}_j)\).

Source code in spectrans/kernels/base.py

@abstractmethod
def compute(self, x: Tensor, y: Tensor) -> Tensor:
    r"""Compute kernel values between x and y.

    Parameters
    ----------
    x : Tensor
        First input tensor of shape (..., n, d).
    y : Tensor
        Second input tensor of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m) where element $(i,j)$
        contains $k(\mathbf{x}_i, \mathbf{y}_j)$.
    """
    pass

gram_matrix ¶

gram_matrix(x: Tensor) -> Tensor

Compute Gram matrix \(K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)\).

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., n, d).	required

Returns:

Type	Description
`Tensor`	Gram matrix of shape (..., n, n).

Source code in spectrans/kernels/base.py

def gram_matrix(self, x: Tensor) -> Tensor:
    r"""Compute Gram matrix $K_{ij} = k(\mathbf{x}_i, \mathbf{x}_j)$.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Gram matrix of shape (..., n, n).
    """
    return self.compute(x, x)

is_positive_definite ¶

is_positive_definite(x: Tensor, eps: float = 1e-06) -> bool

Check if the kernel yields a positive definite Gram matrix.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., n, d).	required
`eps`	`float`	Tolerance for eigenvalue positivity check.	`1e-6`

Returns:

Type	Description
`bool`	True if all eigenvalues of Gram matrix are > eps.

Source code in spectrans/kernels/base.py

def is_positive_definite(self, x: Tensor, eps: float = 1e-6) -> bool:
    """Check if the kernel yields a positive definite Gram matrix.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).
    eps : float, default=1e-6
        Tolerance for eigenvalue positivity check.

    Returns
    -------
    bool
        True if all eigenvalues of Gram matrix are > eps.
    """
    gram = self.gram_matrix(x)
    eigenvalues = torch.linalg.eigvalsh(gram)
    return bool(torch.all(eigenvalues > eps).item())

RandomFeatureMap ¶

RandomFeatureMap(input_dim: int, num_features: int, kernel_scale: float = 1.0, seed: int | None = None)

Bases: Module, ABC

Abstract base class for random feature map approximations.

Random feature maps provide finite-dimensional approximations to kernel functions through the mapping:

.. math:: k(\mathbf{x}, \mathbf{y}) \approx \varphi(\mathbf{x})^T \varphi(\mathbf{y})

This enables linear-time computation of kernel operations.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of input vectors.	required
`num_features`	`int`	Number of random features (D).	required
`kernel_scale`	`float`	Scaling parameter for the kernel.	`1.0`
`seed`	`int \| None`	Random seed for reproducibility.	`None`

Attributes:

Name	Type	Description
`input_dim`	`int`	Input dimension.
`num_features`	`int`	Number of random features.
`kernel_scale`	`float`	Kernel scaling parameter.

Methods:

Name	Description
`forward`	Apply feature map to input.
`kernel_approximation`	Approximate kernel matrix using feature maps.

Source code in spectrans/kernels/base.py

def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_scale: float = 1.0,
    seed: int | None = None,
):
    super().__init__()
    self.input_dim = input_dim
    self.num_features = num_features
    self.kernel_scale = kernel_scale

    if seed is not None:
        torch.manual_seed(seed)

Functions¶

forward `abstractmethod` ¶

forward(x: Tensor) -> Tensor

Apply feature map to input.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., n, d).	required

Returns:

Type	Description
`Tensor`	Feature mapped tensor of shape (..., n, D) where D is the number of random features.

Source code in spectrans/kernels/base.py

@abstractmethod
def forward(self, x: Tensor) -> Tensor:
    """Apply feature map to input.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D) where D
        is the number of random features.
    """
    pass

kernel_approximation ¶

kernel_approximation(x: Tensor, y: Tensor) -> Tensor

Approximate kernel matrix using feature maps.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	First input of shape (..., n, d).	required
`y`	`Tensor`	Second input of shape (..., m, d).	required

Returns:

Type	Description
`Tensor`	Approximated kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py

def kernel_approximation(self, x: Tensor, y: Tensor) -> Tensor:
    """Approximate kernel matrix using feature maps.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Approximated kernel matrix of shape (..., n, m).
    """
    phi_x = self.forward(x)  # (..., n, D)
    phi_y = self.forward(y)  # (..., m, D)
    return torch.matmul(phi_x, phi_y.transpose(-2, -1))

ShiftInvariantKernel ¶

ShiftInvariantKernel(bandwidth: float = 1.0)

Bases: KernelFunction

Base class for shift-invariant (stationary) kernels.

Shift-invariant kernels depend only on the difference \(\mathbf{x} - \mathbf{y}\), i.e., \(k(\mathbf{x}, \mathbf{y}) = k(\mathbf{x} - \mathbf{y}, \mathbf{0})\) \(= \kappa(\mathbf{x} - \mathbf{y})\) for some function \(\kappa\).

These kernels admit Random Fourier Features approximation via Bochner's theorem.

Parameters:

Name	Type	Description	Default
`bandwidth`	`float`	Kernel bandwidth parameter (inverse of length scale).	`1.0`

Attributes:

Name	Type	Description
`bandwidth`	`float`	The bandwidth parameter.

Methods:

Name	Description
`evaluate_difference`	Evaluate kernel on difference vectors.
`compute`	Compute kernel matrix for shift-invariant kernel.
`spectral_density`	Fourier transform of the kernel (spectral density).

Source code in spectrans/kernels/base.py

def __init__(self, bandwidth: float = 1.0):
    self.bandwidth = bandwidth

Functions¶

evaluate_difference `abstractmethod` ¶

evaluate_difference(diff: Tensor) -> Tensor

Evaluate kernel on difference vectors.

Parameters:

Name	Type	Description	Default
`diff`	`Tensor`	Difference vectors \(\mathbf{x} - \mathbf{y}\) of shape (..., d).	required

Returns:

Type	Description
`Tensor`	Kernel values \(\kappa(\text{diff})\) of shape (...).

Source code in spectrans/kernels/base.py

@abstractmethod
def evaluate_difference(self, diff: Tensor) -> Tensor:
    r"""Evaluate kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors $\mathbf{x} - \mathbf{y}$ of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values $\kappa(\text{diff})$ of shape (...).
    """
    pass

compute ¶

compute(x: Tensor, y: Tensor) -> Tensor

Compute kernel matrix for shift-invariant kernel.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	First input of shape (..., n, d).	required
`y`	`Tensor`	Second input of shape (..., m, d).	required

Returns:

Type	Description
`Tensor`	Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py

def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute kernel matrix for shift-invariant kernel.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    # Compute pairwise differences
    x_expanded = x.unsqueeze(-2)  # (..., n, 1, d)
    y_expanded = y.unsqueeze(-3)  # (..., 1, m, d)
    diff = x_expanded - y_expanded  # (..., n, m, d)

    # Evaluate kernel on differences
    return self.evaluate_difference(diff)

spectral_density `abstractmethod` ¶

spectral_density(omega: Tensor) -> Tensor

Fourier transform of the kernel (spectral density).

For shift-invariant kernels, this defines the sampling distribution for Random Fourier Features.

Parameters:

Name	Type	Description	Default
`omega`	`Tensor`	Frequency vectors of shape (..., d).	required

Returns:

Type	Description
`Tensor`	Spectral density values of shape (...).

Source code in spectrans/kernels/base.py

@abstractmethod
def spectral_density(self, omega: Tensor) -> Tensor:
    """Fourier transform of the kernel (spectral density).

    For shift-invariant kernels, this defines the sampling
    distribution for Random Fourier Features.

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    pass

PolynomialKernel ¶

PolynomialKernel(degree: int = 2, alpha: float = 1.0, coef0: float = 0.0)

Bases: KernelFunction

Polynomial kernel.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = (\alpha \langle \mathbf{x}, \mathbf{y} \rangle + c)^d\).

Parameters:

Name	Type	Description	Default
`degree`	`int`	Polynomial degree.	`2`
`alpha`	`float`	Scaling of inner product.	`1.0`
`coef0`	`float`	Constant term.	`0.0`

Attributes:

Name	Type	Description
`degree`	`int`	The polynomial degree.
`alpha`	`float`	Inner product scaling.
`coef0`	`float`	Constant coefficient.

Methods:

Name	Description
`compute`	Compute polynomial kernel matrix.

Source code in spectrans/kernels/base.py

def __init__(
    self,
    degree: int = 2,
    alpha: float = 1.0,
    coef0: float = 0.0,
):
    self.degree = degree
    self.alpha = alpha
    self.coef0 = coef0

Functions¶

compute ¶

compute(x: Tensor, y: Tensor) -> Tensor

Compute polynomial kernel matrix.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	First input of shape (..., n, d).	required
`y`	`Tensor`	Second input of shape (..., m, d).	required

Returns:

Type	Description
`Tensor`	Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py

def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute polynomial kernel matrix.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    inner_product = torch.matmul(x, y.transpose(-2, -1))
    return (self.alpha * inner_product + self.coef0) ** self.degree

CosineKernel ¶

CosineKernel(eps: float = 1e-08)

Bases: KernelFunction

Cosine similarity kernel.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) =\) \(\frac{\langle \mathbf{x}, \mathbf{y} \rangle}{\|\mathbf{x}\| \|\mathbf{y}\|}\).

Parameters:

Name	Type	Description	Default
`eps`	`float`	Small value for numerical stability.	`1e-8`

Attributes:

Name	Type	Description
`eps`	`float`	Numerical stability parameter.

Methods:

Name	Description
`compute`	Compute cosine similarity kernel matrix.

Source code in spectrans/kernels/base.py

def __init__(self, eps: float = 1e-8):
    self.eps = eps

Functions¶

compute ¶

compute(x: Tensor, y: Tensor) -> Tensor

Compute cosine similarity kernel matrix.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	First input of shape (..., n, d).	required
`y`	`Tensor`	Second input of shape (..., m, d).	required

Returns:

Type	Description
`Tensor`	Kernel matrix of shape (..., n, m).

Source code in spectrans/kernels/base.py

def compute(self, x: Tensor, y: Tensor) -> Tensor:
    """Compute cosine similarity kernel matrix.

    Parameters
    ----------
    x : Tensor
        First input of shape (..., n, d).
    y : Tensor
        Second input of shape (..., m, d).

    Returns
    -------
    Tensor
        Kernel matrix of shape (..., n, m).
    """
    x_norm = torch.norm(x, dim=-1, keepdim=True)  # (..., n, 1)
    y_norm = torch.norm(y, dim=-1, keepdim=True)  # (..., m, 1)

    x_normalized = x / (x_norm + self.eps)
    y_normalized = y / (y_norm + self.eps)

    return torch.matmul(x_normalized, y_normalized.transpose(-2, -1))

Base Kernel Classes¶

spectrans.kernels.base ¶

Classes¶

KernelFunction ¶

Functions¶

compute abstractmethod ¶

gram_matrix ¶

is_positive_definite ¶

RandomFeatureMap ¶

Functions¶

forward abstractmethod ¶

kernel_approximation ¶

ShiftInvariantKernel ¶

Functions¶

evaluate_difference abstractmethod ¶

compute ¶

spectral_density abstractmethod ¶

PolynomialKernel ¶

Functions¶

compute ¶

CosineKernel ¶

Functions¶

compute ¶

compute `abstractmethod` ¶

forward `abstractmethod` ¶

evaluate_difference `abstractmethod` ¶

spectral_density `abstractmethod` ¶