This function provides an adaptive method for controlling gradient magnitudes. It calculates a threshold based on a specified quantile of the absolute values within the gradient matrix itself. Any value exceeding this dynamic threshold is "clipped" or shrunk back to the threshold, preserving its original sign.

clip_gradient_by_quantile(gradient, quantile = 0.8)

Arguments

gradient

The gradient matrix to be clipped.

quantile

The quantile to use for determining the clipping threshold. For example, a value of `0.98` means that any gradient value larger in magnitude than the 98th percentile of all absolute gradient values will be clipped. Must be between 0 and 1.

Value

A new gradient matrix with extreme values clipped.

Details

This is more robust than a fixed threshold as it automatically adapts to the scale of the gradients at each optimization step.

Examples

# Create a gradient with some large outlier values
set.seed(123)
grad_matrix <- matrix(rnorm(100, mean = 0, sd = 1), 10, 10)
grad_matrix[1, 1] <- 10  # Large positive outlier
grad_matrix[5, 5] <- -12 # Large negative outlier

# Clip at the 80th percentile. This will tame the outliers.
clipped_grad <- clip_gradient_by_quantile(grad_matrix, quantile = 0.80)

cat("Original Gradient Range:\n")
#> Original Gradient Range:
print(range(grad_matrix))
#> [1] -12  10

cat("\n80th Percentile Threshold:\n")
#> 
#> 80th Percentile Threshold:
# The threshold will be the 80th percentile of the absolute values
print(quantile(abs(grad_matrix), probs = 0.80))
#>      80% 
#> 1.221391 

cat("\nClipped Gradient Range:\n")
#> 
#> Clipped Gradient Range:
# The new range will be capped at the threshold
print(range(clipped_grad))
#> [1] -1.221391  1.221391