fast_whiten.Rd
Perform whitening (sphering) of a data matrix \(X\) such that the covariance matrix of the output is approximately the identity. The implementation automatically chooses the most efficient algorithm depending on whether the number of features \(p\) is much larger than the number of samples \(n\) (\(p >> n\)) or vice versa (\(n >> p\)).
fast_whiten(X, epsilon = 1e-08)
A numeric matrix of the same dimensions as X
, with whitened
columns. The covariance of the whitened data is approximately the identity
in the relevant space.
Whitening is performed via an SVD-based approach with Tikhonov-style regularization to handle small singular values.
- If \(p > n\), whitening is computed in the sample space (avoids forming a \(p \times p\) covariance). - If \(n >= p\), whitening is computed in the feature space (avoids forming an \(n \times n\) covariance).
Computational complexity:
\(O(n^2 p)\) when \(p >> n\)
\(O(n p^2)\) when \(n >> p\)
set.seed(42)
# Case 1: p >> n
X1 <- matrix(rnorm(50 * 2000), 50, 2000)
Xw1 <- fast_whiten(X1)
round(cov(t(Xw1))[1:5, 1:5], 3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0 0 0 0 0
#> [2,] 0 0 0 0 0
#> [3,] 0 0 0 0 0
#> [4,] 0 0 0 0 0
#> [5,] 0 0 0 0 0
# Case 2: n >> p
X2 <- matrix(rnorm(2000 * 50), 2000, 50)
Xw2 <- fast_whiten(X2)
round(cov(Xw2)[1:5, 1:5], 3)
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] 0.001 0.000 0.000 0.000 0.000
#> [2,] 0.000 0.001 0.000 0.000 0.000
#> [3,] 0.000 0.000 0.001 0.000 0.000
#> [4,] 0.000 0.000 0.000 0.001 0.000
#> [5,] 0.000 0.000 0.000 0.000 0.001