This function determines a single optimal number of components (k) to represent a multi-view dataset by analyzing the joint variance explained curve. It offers multiple methods for selecting k and includes a built-in self-test to verify its own correctness.

select_joint_k(
  mat_list = NULL,
  method = c("pca", "spca"),
  k_max = NULL,
  sparsity = 0.5,
  selection_method = c("elbow", "threshold"),
  variance_threshold = 0.9,
  self_test = FALSE
)

Arguments

mat_list

A list of numeric matrices [subjects x features]. Required unless `self_test = TRUE`.

method

The decomposition method. One of `"pca"` (fast SVD-based) or `"spca"`.

k_max

The maximum number of components to consider.

sparsity

A single sparsity parameter (0-1) for SPCA. Ignored for PCA.

selection_method

The method for choosing k. One of `"elbow"` (point of maximum deviation from a straight line of improvement) or `"threshold"`.

variance_threshold

The proportion of variance (0-1) to be explained. Only used when `selection_method = "threshold"`.

self_test

Logical. If TRUE, the function will ignore all other inputs, run a built-in suite of tests on simulated data, and print the results. This is for verifying the function's integrity. Defaults to FALSE.

Value

If `self_test = FALSE`, a list containing:

  • `optimal_k`: The selected optimal number of components.

  • `joint_variance_curve`: A tibble with `k` and the cumulative proportion of variance explained at each `k`.

  • `plot`: A ggplot object visualizing the results.

If `self_test = TRUE`, it prints test results and returns invisibly.