Skip to content

Random Fourier Features

spectrans.kernels.rff

Random Fourier Features (RFF) for kernel approximation.

This module implements Random Fourier Features, a technique for approximating shift-invariant kernels through explicit feature maps. RFF enables linear-time computation of kernel operations that would normally require quadratic time.

The implementation supports various kernel types including Gaussian (RBF), Laplacian, and other shift-invariant kernels. It also includes orthogonal random features that reduce approximation variance.

Classes:

Name Description
GaussianRFFKernel

Gaussian/RBF kernel with RFF approximation.

LaplacianRFFKernel

Laplacian kernel with RFF approximation.

OrthogonalRandomFeatures

Orthogonal variant of random features for better approximation.

RFFAttentionKernel

Specialized RFF for attention mechanisms.

Examples:

Basic Gaussian RFF usage:

>>> import torch
>>> from spectrans.kernels.rff import GaussianRFFKernel
>>> kernel = GaussianRFFKernel(input_dim=64, num_features=256, sigma=1.0)
>>> x = torch.randn(32, 100, 64)  # (batch, sequence, dim)
>>> features = kernel.feature_map(x)
>>> assert features.shape == (32, 100, 256)

Computing approximate kernel matrix:

>>> y = torch.randn(32, 50, 64)
>>> K_approx = kernel.kernel_approximation(x, y)
>>> assert K_approx.shape == (32, 100, 50)

Using orthogonal features:

>>> from spectrans.kernels.rff import OrthogonalRandomFeatures
>>> orf = OrthogonalRandomFeatures(input_dim=64, num_features=256)
>>> features = orf(x)
Notes

For a shift-invariant kernel \(k(\mathbf{x}, \mathbf{y}) = \kappa(\mathbf{x} - \mathbf{y})\) with Fourier transform \(p(\omega)\), Bochner's theorem gives:

\[ k(\mathbf{x}, \mathbf{y}) = \int p(\omega) \exp(i\omega^T(\mathbf{x}-\mathbf{y})) d\omega \]

The RFF approximation samples \(\omega \sim p(\omega)\) and uses:

\[ \varphi(\mathbf{x}) = \sqrt{\frac{2}{D}} \left[\cos(\omega_1^T\mathbf{x} + b_1), \ldots, \cos(\omega_D^T\mathbf{x} + b_D)\right] \]

This gives \(k(\mathbf{x}, \mathbf{y}) \approx \varphi(\mathbf{x})^T \varphi(\mathbf{y})\) with approximation error \(O(1/\sqrt{D})\).

For Gaussian kernel: \(p(\omega) = \mathcal{N}(0, \sigma^2 I)\)

For Laplacian kernel: \(p(\omega) = \text{Cauchy}(0, \sigma)\)

References

Ali Rahimi and Benjamin Recht. 2007. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20 (NeurIPS 2007), pages 1177-1184.

Felix X. Yu, Ananda Theertha Suresh, Krzysztof M. Choromanski, Daniel N. Holtmann-Rice, and Sanjiv Kumar. 2016. Orthogonal random features. In Advances in Neural Information Processing Systems 29 (NeurIPS 2016), pages 1975-1983.

Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, and Adrian Weller. 2021. Rethinking attention with performers. In Proceedings of the International Conference on Learning Representations (ICLR).

See Also

spectrans.kernels.base : Base kernel interfaces. spectrans.layers.attention.spectral : Spectral attention using RFF.

Classes

GaussianRFFKernel

GaussianRFFKernel(input_dim: int, num_features: int, sigma: float = 1.0, use_cos_sin: bool = False, orthogonal: bool = False, trainable: bool = False, seed: int | None = None)

Bases: ShiftInvariantKernel, RandomFeatureMap

Gaussian (RBF) kernel with Random Fourier Features approximation.

Implements the Gaussian kernel using RFF.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{y}\|^2}{2\sigma^2}\right)\).

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors.

required
num_features int

Number of random Fourier features.

required
sigma float

Kernel bandwidth (standard deviation).

1.0
use_cos_sin bool

If True, use both cos and sin features (doubles feature dimension).

False
orthogonal bool

If True, use orthogonal random features.

False
trainable bool

If True, make random parameters trainable.

False
seed int | None

Random seed for reproducibility.

None

Attributes:

Name Type Description
omega Parameter or Tensor

Random frequencies of shape (input_dim, num_features).

bias Parameter or Tensor

Random phase shifts of shape (num_features,).

Methods:

Name Description
forward

Apply random Fourier feature map.

evaluate_difference

Evaluate Gaussian kernel on difference vectors.

spectral_density

Spectral density for Gaussian kernel (Gaussian distribution).

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    sigma: float = 1.0,
    use_cos_sin: bool = False,
    orthogonal: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    ShiftInvariantKernel.__init__(self, bandwidth=1.0 / sigma)
    RandomFeatureMap.__init__(self, input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.sigma = sigma
    self.use_cos_sin = use_cos_sin
    self.orthogonal = orthogonal
    self.trainable = trainable

    # Effective number of output features
    self.output_features = num_features * 2 if use_cos_sin else num_features

    # Initialize random parameters
    self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply random Fourier feature map.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Feature mapped tensor of shape (..., n, D) where D is self.output_features.

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply random Fourier feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D) where D is
        self.output_features.
    """
    # Linear projection: (..., n, d) @ (d, m) -> (..., n, m)
    projection = torch.matmul(x, self.omega)

    # Add phase shifts
    projection = projection + self.bias

    if self.use_cos_sin:
        # Use both cos and sin features
        cos_features = torch.cos(projection)
        sin_features = torch.sin(projection)
        features = torch.cat([cos_features, sin_features], dim=-1)
        # Normalization factor for cos+sin
        scale = math.sqrt(1.0 / self.num_features)
    else:
        # Use only cos features
        features = torch.cos(projection)
        # Normalization factor for cos only
        scale = math.sqrt(2.0 / self.num_features)

    return features * scale
evaluate_difference
evaluate_difference(diff: Tensor) -> Tensor

Evaluate Gaussian kernel on difference vectors.

Parameters:

Name Type Description Default
diff Tensor

Difference vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Kernel values of shape (...).

Source code in spectrans/kernels/rff.py
def evaluate_difference(self, diff: Tensor) -> Tensor:
    """Evaluate Gaussian kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values of shape (...).
    """
    squared_norm = torch.sum(diff**2, dim=-1)
    return torch.exp(-squared_norm / (2 * self.sigma**2))
spectral_density
spectral_density(omega: Tensor) -> Tensor

Spectral density for Gaussian kernel (Gaussian distribution).

Parameters:

Name Type Description Default
omega Tensor

Frequency vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Spectral density values of shape (...).

Source code in spectrans/kernels/rff.py
def spectral_density(self, omega: Tensor) -> Tensor:
    """Spectral density for Gaussian kernel (Gaussian distribution).

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    d = omega.shape[-1]
    norm_squared = torch.sum(omega**2, dim=-1)
    # Gaussian spectral density
    result: Tensor = (2 * math.pi * self.sigma**2) ** (d / 2) * torch.exp(
        -0.5 * self.sigma**2 * norm_squared
    )
    return result

LaplacianRFFKernel

LaplacianRFFKernel(input_dim: int, num_features: int, sigma: float = 1.0, use_cos_sin: bool = False, trainable: bool = False, seed: int | None = None)

Bases: ShiftInvariantKernel, RandomFeatureMap

Laplacian kernel with Random Fourier Features approximation.

Implements the Laplacian kernel using RFF with Cauchy distribution.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{y}\|_1}{\sigma}\right)\).

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors.

required
num_features int

Number of random Fourier features.

required
sigma float

Kernel bandwidth parameter.

1.0
use_cos_sin bool

If True, use both cos and sin features.

False
trainable bool

If True, make random parameters trainable.

False
seed int | None

Random seed for reproducibility.

None

Methods:

Name Description
forward

Apply random Fourier feature map.

evaluate_difference

Evaluate Laplacian kernel on difference vectors.

spectral_density

Spectral density for Laplacian kernel (Cauchy distribution).

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    sigma: float = 1.0,
    use_cos_sin: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    ShiftInvariantKernel.__init__(self, bandwidth=1.0 / sigma)
    RandomFeatureMap.__init__(self, input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.sigma = sigma
    self.use_cos_sin = use_cos_sin
    self.trainable = trainable

    self.output_features = num_features * 2 if use_cos_sin else num_features

    self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply random Fourier feature map.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply random Fourier feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D).
    """
    projection = torch.matmul(x, self.omega) + self.bias

    if self.use_cos_sin:
        cos_features = torch.cos(projection)
        sin_features = torch.sin(projection)
        features = torch.cat([cos_features, sin_features], dim=-1)
        scale = math.sqrt(1.0 / self.num_features)
    else:
        features = torch.cos(projection)
        scale = math.sqrt(2.0 / self.num_features)

    return features * scale
evaluate_difference
evaluate_difference(diff: Tensor) -> Tensor

Evaluate Laplacian kernel on difference vectors.

Parameters:

Name Type Description Default
diff Tensor

Difference vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Kernel values of shape (...).

Source code in spectrans/kernels/rff.py
def evaluate_difference(self, diff: Tensor) -> Tensor:
    """Evaluate Laplacian kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values of shape (...).
    """
    l1_norm = torch.sum(torch.abs(diff), dim=-1)
    return torch.exp(-l1_norm / self.sigma)
spectral_density
spectral_density(omega: Tensor) -> Tensor

Spectral density for Laplacian kernel (Cauchy distribution).

Parameters:

Name Type Description Default
omega Tensor

Frequency vectors of shape (..., d).

required

Returns:

Type Description
Tensor

Spectral density values of shape (...).

Source code in spectrans/kernels/rff.py
def spectral_density(self, omega: Tensor) -> Tensor:
    """Spectral density for Laplacian kernel (Cauchy distribution).

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    d = omega.shape[-1]
    # Product of 1D Cauchy densities
    density = torch.ones_like(omega[..., 0])
    for i in range(d):
        density = density * (
            2 * self.sigma / (math.pi * (1 + (self.sigma * omega[..., i]) ** 2))
        )
    return density

OrthogonalRandomFeatures

OrthogonalRandomFeatures(input_dim: int, num_features: int, kernel_type: Literal['gaussian', 'laplacian'] = 'gaussian', sigma: float = 1.0, use_hadamard: bool = False, trainable: bool = False, seed: int | None = None)

Bases: RandomFeatureMap

Orthogonal Random Features for kernel approximation.

Uses structured orthogonal matrices to reduce approximation variance compared to standard i.i.d. Gaussian features.

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors.

required
num_features int

Number of random features.

required
kernel_type Literal['gaussian', 'laplacian']

Type of kernel to approximate.

"gaussian"
sigma float

Kernel bandwidth parameter.

1.0
use_hadamard bool

If True, use fast Hadamard transform.

False
trainable bool

If True, make scaling parameters trainable.

False
seed int | None

Random seed.

None

Methods:

Name Description
forward

Apply orthogonal random feature map.

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_type: Literal["gaussian", "laplacian"] = "gaussian",
    sigma: float = 1.0,
    use_hadamard: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    super().__init__(input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.kernel_type = kernel_type
    self.sigma = sigma
    self.use_hadamard = use_hadamard
    self.trainable = trainable

    self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply orthogonal random feature map.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply orthogonal random feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D).
    """
    if self.use_hadamard:
        # Pad input if necessary
        if x.shape[-1] < self.d_padded:
            padding = self.d_padded - x.shape[-1]
            x = F.pad(x, (0, padding))

        # Apply HD HD HD structure
        z = x
        for i in range(3):
            if hasattr(self, "diagonals"):
                diag = self.diagonals[i]
            else:
                diag = getattr(self, f"diagonal_{i}")
            z = z * diag
            z = self._hadamard_transform(z)

        # Truncate to desired number of features
        projection = z[..., : self.num_features]
    else:
        projection = torch.matmul(x, self.projection)

    # Add bias and apply cosine
    projection = projection + self.bias
    features = torch.cos(projection)

    # Normalize
    scale = math.sqrt(2.0 / self.num_features)
    return features * scale

RFFAttentionKernel

RFFAttentionKernel(input_dim: int, num_features: int, kernel_type: Literal['softmax', 'relu', 'elu'] = 'softmax', use_orthogonal: bool = True, redraw: bool = False, seed: int | None = None)

Bases: RandomFeatureMap

Random Fourier Features specifically designed for attention mechanisms.

Implements positive random features for use in linear attention, following the Performer architecture.

Parameters:

Name Type Description Default
input_dim int

Dimension of input vectors (typically head_dim).

required
num_features int

Number of random features.

required
kernel_type Literal['softmax', 'relu', 'elu']

Type of kernel approximation.

"softmax"
use_orthogonal bool

If True, use orthogonal random features.

True
redraw bool

If True, redraw random features at each forward pass.

False
seed int | None

Random seed.

None

Methods:

Name Description
forward

Apply random feature map for attention.

Source code in spectrans/kernels/rff.py
def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_type: Literal["softmax", "relu", "elu"] = "softmax",
    use_orthogonal: bool = True,
    redraw: bool = False,
    seed: int | None = None,
):
    super().__init__(input_dim, num_features, seed=seed)

    self.kernel_type = kernel_type
    self.use_orthogonal = use_orthogonal
    self.redraw = redraw

    if not redraw:
        self._initialize_parameters()
Functions
forward
forward(x: Tensor) -> Tensor

Apply random feature map for attention.

Parameters:

Name Type Description Default
x Tensor

Input tensor of shape (..., n, d).

required

Returns:

Type Description
Tensor

Positive feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py
def forward(self, x: Tensor) -> Tensor:
    """Apply random feature map for attention.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Positive feature mapped tensor of shape (..., n, D).
    """
    if self.redraw:
        # Redraw random features (useful for training)
        projection = (
            self._sample_orthogonal_gaussian()
            if self.use_orthogonal
            else torch.randn(self.input_dim, self.num_features, device=x.device)
        )
        projection = projection / math.sqrt(self.input_dim)
    else:
        projection = self.projection

    # Linear projection
    z = torch.matmul(x, projection)

    if self.kernel_type == "softmax":
        # Positive features for softmax kernel approximation
        # $\varphi(\mathbf{x}) = \exp(\mathbf{x}^T \omega - \|\mathbf{x}\|^2/2) / \sqrt{m}$
        x_norm_sq = torch.sum(x**2, dim=-1, keepdim=True) / 2
        features = torch.exp(z - x_norm_sq)
        scale = 1.0 / math.sqrt(self.num_features)

    elif self.kernel_type == "relu":
        # ReLU kernel: $\max(0, \mathbf{x}^T \omega)$
        features = F.relu(z)
        scale = math.sqrt(2.0 / self.num_features)

    else:  # elu
        # ELU kernel for smooth approximation
        features = F.elu(z) + 1
        scale = 1.0 / math.sqrt(self.num_features)

    return features * scale

Functions