Perform whitening (sphering) of a data matrix \(X\) such that the covariance matrix of the output is approximately the identity. The implementation automatically chooses the most efficient algorithm depending on whether the number of features \(p\) is much larger than the number of samples \(n\) (\(p >> n\)) or vice versa (\(n >> p\)).

fast_whiten(X, epsilon = 1e-08)

Arguments

X

A numeric matrix of size \(n \times p\), where \(n\) is the number of samples (rows) and \(p\) is the number of features (columns).

epsilon

A small non-negative numeric value used to stabilize the inversion of singular values (default = 1e-8).

Value

A numeric matrix of the same dimensions as X, with whitened columns. The covariance of the whitened data is approximately the identity in the relevant space.

Details

Whitening is performed via an SVD-based approach with Tikhonov-style regularization to handle small singular values.

- If \(p > n\), whitening is computed in the sample space (avoids forming a \(p \times p\) covariance). - If \(n >= p\), whitening is computed in the feature space (avoids forming an \(n \times n\) covariance).

Computational complexity:

  • \(O(n^2 p)\) when \(p >> n\)

  • \(O(n p^2)\) when \(n >> p\)

Examples

set.seed(42)

# Case 1: p >> n
X1 <- matrix(rnorm(50 * 2000), 50, 2000)
Xw1 <- fast_whiten(X1)
round(cov(t(Xw1))[1:5, 1:5], 3)
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    0    0    0    0    0
#> [2,]    0    0    0    0    0
#> [3,]    0    0    0    0    0
#> [4,]    0    0    0    0    0
#> [5,]    0    0    0    0    0

# Case 2: n >> p
X2 <- matrix(rnorm(2000 * 50), 2000, 50)
Xw2 <- fast_whiten(X2)
round(cov(Xw2)[1:5, 1:5], 3)
#>       [,1]  [,2]  [,3]  [,4]  [,5]
#> [1,] 0.001 0.000 0.000 0.000 0.000
#> [2,] 0.000 0.001 0.000 0.000 0.000
#> [3,] 0.000 0.000 0.001 0.000 0.000
#> [4,] 0.000 0.000 0.000 0.001 0.000
#> [5,] 0.000 0.000 0.000 0.000 0.001