Random Fourier Features¶

spectrans.kernels.rff ¶

Random Fourier Features (RFF) for kernel approximation.

This module implements Random Fourier Features, a technique for approximating shift-invariant kernels through explicit feature maps. RFF enables linear-time computation of kernel operations that would normally require quadratic time.

The implementation supports various kernel types including Gaussian (RBF), Laplacian, and other shift-invariant kernels. It also includes orthogonal random features that reduce approximation variance.

Classes:

Name	Description
`GaussianRFFKernel`	Gaussian/RBF kernel with RFF approximation.
`LaplacianRFFKernel`	Laplacian kernel with RFF approximation.
`OrthogonalRandomFeatures`	Orthogonal variant of random features for better approximation.
`RFFAttentionKernel`	Specialized RFF for attention mechanisms.

Examples:

Basic Gaussian RFF usage:

>>> import torch
>>> from spectrans.kernels.rff import GaussianRFFKernel
>>> kernel = GaussianRFFKernel(input_dim=64, num_features=256, sigma=1.0)
>>> x = torch.randn(32, 100, 64)  # (batch, sequence, dim)
>>> features = kernel.feature_map(x)
>>> assert features.shape == (32, 100, 256)

Computing approximate kernel matrix:

>>> y = torch.randn(32, 50, 64)
>>> K_approx = kernel.kernel_approximation(x, y)
>>> assert K_approx.shape == (32, 100, 50)

Using orthogonal features:

>>> from spectrans.kernels.rff import OrthogonalRandomFeatures
>>> orf = OrthogonalRandomFeatures(input_dim=64, num_features=256)
>>> features = orf(x)

Notes

For a shift-invariant kernel \(k(\mathbf{x}, \mathbf{y}) = \kappa(\mathbf{x} - \mathbf{y})\) with Fourier transform \(p(\omega)\), Bochner's theorem gives:

\[ k(\mathbf{x}, \mathbf{y}) = \int p(\omega) \exp(i\omega^T(\mathbf{x}-\mathbf{y})) d\omega \]

The RFF approximation samples \(\omega \sim p(\omega)\) and uses:

\[ \varphi(\mathbf{x}) = \sqrt{\frac{2}{D}} \left[\cos(\omega_1^T\mathbf{x} + b_1), \ldots, \cos(\omega_D^T\mathbf{x} + b_D)\right] \]

This gives \(k(\mathbf{x}, \mathbf{y}) \approx \varphi(\mathbf{x})^T \varphi(\mathbf{y})\) with approximation error \(O(1/\sqrt{D})\).

For Gaussian kernel: \(p(\omega) = \mathcal{N}(0, \sigma^2 I)\)

For Laplacian kernel: \(p(\omega) = \text{Cauchy}(0, \sigma)\)

References

Ali Rahimi and Benjamin Recht. 2007. Random features for large-scale kernel machines. In Advances in Neural Information Processing Systems 20 (NeurIPS 2007), pages 1177-1184.

Felix X. Yu, Ananda Theertha Suresh, Krzysztof M. Choromanski, Daniel N. Holtmann-Rice, and Sanjiv Kumar. 2016. Orthogonal random features. In Advances in Neural Information Processing Systems 29 (NeurIPS 2016), pages 1975-1983.

Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, and Adrian Weller. 2021. Rethinking attention with performers. In Proceedings of the International Conference on Learning Representations (ICLR).

Classes¶

GaussianRFFKernel ¶

GaussianRFFKernel(input_dim: int, num_features: int, sigma: float = 1.0, use_cos_sin: bool = False, orthogonal: bool = False, trainable: bool = False, seed: int | None = None)

Bases: ShiftInvariantKernel, RandomFeatureMap

Gaussian (RBF) kernel with Random Fourier Features approximation.

Implements the Gaussian kernel using RFF.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{y}\|^2}{2\sigma^2}\right)\).

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of input vectors.	required
`num_features`	`int`	Number of random Fourier features.	required
`sigma`	`float`	Kernel bandwidth (standard deviation).	`1.0`
`use_cos_sin`	`bool`	If True, use both cos and sin features (doubles feature dimension).	`False`
`orthogonal`	`bool`	If True, use orthogonal random features.	`False`
`trainable`	`bool`	If True, make random parameters trainable.	`False`
`seed`	`int \| None`	Random seed for reproducibility.	`None`

Attributes:

Name	Type	Description
`omega`	`Parameter or Tensor`	Random frequencies of shape (input_dim, num_features).
`bias`	`Parameter or Tensor`	Random phase shifts of shape (num_features,).

Methods:

Name	Description
`forward`	Apply random Fourier feature map.
`evaluate_difference`	Evaluate Gaussian kernel on difference vectors.
`spectral_density`	Spectral density for Gaussian kernel (Gaussian distribution).

Source code in spectrans/kernels/rff.py

def __init__(
    self,
    input_dim: int,
    num_features: int,
    sigma: float = 1.0,
    use_cos_sin: bool = False,
    orthogonal: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    ShiftInvariantKernel.__init__(self, bandwidth=1.0 / sigma)
    RandomFeatureMap.__init__(self, input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.sigma = sigma
    self.use_cos_sin = use_cos_sin
    self.orthogonal = orthogonal
    self.trainable = trainable

    # Effective number of output features
    self.output_features = num_features * 2 if use_cos_sin else num_features

    # Initialize random parameters
    self._initialize_parameters()

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Apply random Fourier feature map.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., n, d).	required

Returns:

Type	Description
`Tensor`	Feature mapped tensor of shape (..., n, D) where D is self.output_features.

Source code in spectrans/kernels/rff.py

def forward(self, x: Tensor) -> Tensor:
    """Apply random Fourier feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D) where D is
        self.output_features.
    """
    # Linear projection: (..., n, d) @ (d, m) -> (..., n, m)
    projection = torch.matmul(x, self.omega)

    # Add phase shifts
    projection = projection + self.bias

    if self.use_cos_sin:
        # Use both cos and sin features
        cos_features = torch.cos(projection)
        sin_features = torch.sin(projection)
        features = torch.cat([cos_features, sin_features], dim=-1)
        # Normalization factor for cos+sin
        scale = math.sqrt(1.0 / self.num_features)
    else:
        # Use only cos features
        features = torch.cos(projection)
        # Normalization factor for cos only
        scale = math.sqrt(2.0 / self.num_features)

    return features * scale

evaluate_difference ¶

evaluate_difference(diff: Tensor) -> Tensor

Evaluate Gaussian kernel on difference vectors.

Parameters:

Name	Type	Description	Default
`diff`	`Tensor`	Difference vectors of shape (..., d).	required

Returns:

Type	Description
`Tensor`	Kernel values of shape (...).

Source code in spectrans/kernels/rff.py

def evaluate_difference(self, diff: Tensor) -> Tensor:
    """Evaluate Gaussian kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values of shape (...).
    """
    squared_norm = torch.sum(diff**2, dim=-1)
    return torch.exp(-squared_norm / (2 * self.sigma**2))

spectral_density ¶

spectral_density(omega: Tensor) -> Tensor

Spectral density for Gaussian kernel (Gaussian distribution).

Parameters:

Name	Type	Description	Default
`omega`	`Tensor`	Frequency vectors of shape (..., d).	required

Returns:

Type	Description
`Tensor`	Spectral density values of shape (...).

Source code in spectrans/kernels/rff.py

def spectral_density(self, omega: Tensor) -> Tensor:
    """Spectral density for Gaussian kernel (Gaussian distribution).

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    d = omega.shape[-1]
    norm_squared = torch.sum(omega**2, dim=-1)
    # Gaussian spectral density
    result: Tensor = (2 * math.pi * self.sigma**2) ** (d / 2) * torch.exp(
        -0.5 * self.sigma**2 * norm_squared
    )
    return result

LaplacianRFFKernel ¶

LaplacianRFFKernel(input_dim: int, num_features: int, sigma: float = 1.0, use_cos_sin: bool = False, trainable: bool = False, seed: int | None = None)

Bases: ShiftInvariantKernel, RandomFeatureMap

Laplacian kernel with Random Fourier Features approximation.

Implements the Laplacian kernel using RFF with Cauchy distribution.

The kernel function is: \(k(\mathbf{x}, \mathbf{y}) = \exp\left(-\frac{\|\mathbf{x} - \mathbf{y}\|_1}{\sigma}\right)\).

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of input vectors.	required
`num_features`	`int`	Number of random Fourier features.	required
`sigma`	`float`	Kernel bandwidth parameter.	`1.0`
`use_cos_sin`	`bool`	If True, use both cos and sin features.	`False`
`trainable`	`bool`	If True, make random parameters trainable.	`False`
`seed`	`int \| None`	Random seed for reproducibility.	`None`

Methods:

Name	Description
`forward`	Apply random Fourier feature map.
`evaluate_difference`	Evaluate Laplacian kernel on difference vectors.
`spectral_density`	Spectral density for Laplacian kernel (Cauchy distribution).

Source code in spectrans/kernels/rff.py

def __init__(
    self,
    input_dim: int,
    num_features: int,
    sigma: float = 1.0,
    use_cos_sin: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    ShiftInvariantKernel.__init__(self, bandwidth=1.0 / sigma)
    RandomFeatureMap.__init__(self, input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.sigma = sigma
    self.use_cos_sin = use_cos_sin
    self.trainable = trainable

    self.output_features = num_features * 2 if use_cos_sin else num_features

    self._initialize_parameters()

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Apply random Fourier feature map.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., n, d).	required

Returns:

Type	Description
`Tensor`	Feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py

def forward(self, x: Tensor) -> Tensor:
    """Apply random Fourier feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D).
    """
    projection = torch.matmul(x, self.omega) + self.bias

    if self.use_cos_sin:
        cos_features = torch.cos(projection)
        sin_features = torch.sin(projection)
        features = torch.cat([cos_features, sin_features], dim=-1)
        scale = math.sqrt(1.0 / self.num_features)
    else:
        features = torch.cos(projection)
        scale = math.sqrt(2.0 / self.num_features)

    return features * scale

evaluate_difference ¶

evaluate_difference(diff: Tensor) -> Tensor

Evaluate Laplacian kernel on difference vectors.

Parameters:

Name	Type	Description	Default
`diff`	`Tensor`	Difference vectors of shape (..., d).	required

Returns:

Type	Description
`Tensor`	Kernel values of shape (...).

Source code in spectrans/kernels/rff.py

def evaluate_difference(self, diff: Tensor) -> Tensor:
    """Evaluate Laplacian kernel on difference vectors.

    Parameters
    ----------
    diff : Tensor
        Difference vectors of shape (..., d).

    Returns
    -------
    Tensor
        Kernel values of shape (...).
    """
    l1_norm = torch.sum(torch.abs(diff), dim=-1)
    return torch.exp(-l1_norm / self.sigma)

spectral_density ¶

spectral_density(omega: Tensor) -> Tensor

Spectral density for Laplacian kernel (Cauchy distribution).

Parameters:

Name	Type	Description	Default
`omega`	`Tensor`	Frequency vectors of shape (..., d).	required

Returns:

Type	Description
`Tensor`	Spectral density values of shape (...).

Source code in spectrans/kernels/rff.py

def spectral_density(self, omega: Tensor) -> Tensor:
    """Spectral density for Laplacian kernel (Cauchy distribution).

    Parameters
    ----------
    omega : Tensor
        Frequency vectors of shape (..., d).

    Returns
    -------
    Tensor
        Spectral density values of shape (...).
    """
    d = omega.shape[-1]
    # Product of 1D Cauchy densities
    density = torch.ones_like(omega[..., 0])
    for i in range(d):
        density = density * (
            2 * self.sigma / (math.pi * (1 + (self.sigma * omega[..., i]) ** 2))
        )
    return density

OrthogonalRandomFeatures ¶

OrthogonalRandomFeatures(input_dim: int, num_features: int, kernel_type: Literal['gaussian', 'laplacian'] = 'gaussian', sigma: float = 1.0, use_hadamard: bool = False, trainable: bool = False, seed: int | None = None)

Bases: RandomFeatureMap

Orthogonal Random Features for kernel approximation.

Uses structured orthogonal matrices to reduce approximation variance compared to standard i.i.d. Gaussian features.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of input vectors.	required
`num_features`	`int`	Number of random features.	required
`kernel_type`	`Literal['gaussian', 'laplacian']`	Type of kernel to approximate.	`"gaussian"`
`sigma`	`float`	Kernel bandwidth parameter.	`1.0`
`use_hadamard`	`bool`	If True, use fast Hadamard transform.	`False`
`trainable`	`bool`	If True, make scaling parameters trainable.	`False`
`seed`	`int \| None`	Random seed.	`None`

Methods:

Name	Description
`forward`	Apply orthogonal random feature map.

Source code in spectrans/kernels/rff.py

def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_type: Literal["gaussian", "laplacian"] = "gaussian",
    sigma: float = 1.0,
    use_hadamard: bool = False,
    trainable: bool = False,
    seed: int | None = None,
):
    super().__init__(input_dim, num_features, kernel_scale=sigma, seed=seed)

    self.kernel_type = kernel_type
    self.sigma = sigma
    self.use_hadamard = use_hadamard
    self.trainable = trainable

    self._initialize_parameters()

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Apply orthogonal random feature map.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., n, d).	required

Returns:

Type	Description
`Tensor`	Feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py

def forward(self, x: Tensor) -> Tensor:
    """Apply orthogonal random feature map.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Feature mapped tensor of shape (..., n, D).
    """
    if self.use_hadamard:
        # Pad input if necessary
        if x.shape[-1] < self.d_padded:
            padding = self.d_padded - x.shape[-1]
            x = F.pad(x, (0, padding))

        # Apply HD HD HD structure
        z = x
        for i in range(3):
            if hasattr(self, "diagonals"):
                diag = self.diagonals[i]
            else:
                diag = getattr(self, f"diagonal_{i}")
            z = z * diag
            z = self._hadamard_transform(z)

        # Truncate to desired number of features
        projection = z[..., : self.num_features]
    else:
        projection = torch.matmul(x, self.projection)

    # Add bias and apply cosine
    projection = projection + self.bias
    features = torch.cos(projection)

    # Normalize
    scale = math.sqrt(2.0 / self.num_features)
    return features * scale

RFFAttentionKernel ¶

RFFAttentionKernel(input_dim: int, num_features: int, kernel_type: Literal['softmax', 'relu', 'elu'] = 'softmax', use_orthogonal: bool = True, redraw: bool = False, seed: int | None = None)

Bases: RandomFeatureMap

Random Fourier Features specifically designed for attention mechanisms.

Implements positive random features for use in linear attention, following the Performer architecture.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Dimension of input vectors (typically head_dim).	required
`num_features`	`int`	Number of random features.	required
`kernel_type`	`Literal['softmax', 'relu', 'elu']`	Type of kernel approximation.	`"softmax"`
`use_orthogonal`	`bool`	If True, use orthogonal random features.	`True`
`redraw`	`bool`	If True, redraw random features at each forward pass.	`False`
`seed`	`int \| None`	Random seed.	`None`

Methods:

Name	Description
`forward`	Apply random feature map for attention.

Source code in spectrans/kernels/rff.py

def __init__(
    self,
    input_dim: int,
    num_features: int,
    kernel_type: Literal["softmax", "relu", "elu"] = "softmax",
    use_orthogonal: bool = True,
    redraw: bool = False,
    seed: int | None = None,
):
    super().__init__(input_dim, num_features, seed=seed)

    self.kernel_type = kernel_type
    self.use_orthogonal = use_orthogonal
    self.redraw = redraw

    if not redraw:
        self._initialize_parameters()

Functions¶

forward ¶

forward(x: Tensor) -> Tensor

Apply random feature map for attention.

Parameters:

Name	Type	Description	Default
`x`	`Tensor`	Input tensor of shape (..., n, d).	required

Returns:

Type	Description
`Tensor`	Positive feature mapped tensor of shape (..., n, D).

Source code in spectrans/kernels/rff.py

def forward(self, x: Tensor) -> Tensor:
    """Apply random feature map for attention.

    Parameters
    ----------
    x : Tensor
        Input tensor of shape (..., n, d).

    Returns
    -------
    Tensor
        Positive feature mapped tensor of shape (..., n, D).
    """
    if self.redraw:
        # Redraw random features (useful for training)
        projection = (
            self._sample_orthogonal_gaussian()
            if self.use_orthogonal
            else torch.randn(self.input_dim, self.num_features, device=x.device)
        )
        projection = projection / math.sqrt(self.input_dim)
    else:
        projection = self.projection

    # Linear projection
    z = torch.matmul(x, projection)

    if self.kernel_type == "softmax":
        # Positive features for softmax kernel approximation
        # $\varphi(\mathbf{x}) = \exp(\mathbf{x}^T \omega - \|\mathbf{x}\|^2/2) / \sqrt{m}$
        x_norm_sq = torch.sum(x**2, dim=-1, keepdim=True) / 2
        features = torch.exp(z - x_norm_sq)
        scale = 1.0 / math.sqrt(self.num_features)

    elif self.kernel_type == "relu":
        # ReLU kernel: $\max(0, \mathbf{x}^T \omega)$
        features = F.relu(z)
        scale = math.sqrt(2.0 / self.num_features)

    else:  # elu
        # ELU kernel for smooth approximation
        features = F.elu(z) + 1
        scale = 1.0 / math.sqrt(self.num_features)

    return features * scale