This function creates simulated multi-view data with a complex and realistic ground truth, containing both shared and modality-specific latent signals.

generate_structured_multiview_data(
  n_subjects,
  n_features,
  k_shared,
  k_specific,
  noise_sd = 0.1,
  sparsity_level = 0.5,
  seed = NULL
)

Arguments

n_subjects

The number of subjects (rows).

n_features

A vector of integers specifying the number of features for each view.

k_shared

The number of latent components common to ALL views.

k_specific

A vector of integers specifying the number of unique latent components for EACH view. If a single number is given, it's recycled.

noise_sd

The standard deviation of the Gaussian noise added to the data.

sparsity_level

The proportion of elements in the shared loading matrix (V_f) to set to zero. A value between 0 and 1. Defaults to 0.5.

seed

An optional random seed for reproducibility.

Value

A list containing:

data_list

A named list of the final [n x p] data matrices.

ground_truth

A list containing the true latent structures:

  • `U_shared`: The ground truth shared basis [n x k_shared]. This is what `simlr` should aim to recover.

  • `V_shared`: A list of the ground truth shared loading matrices [p x k_shared]. This is what the `simlr` `v` matrices should be compared against.

  • `U_specific`: A list of the ground truth modality-specific bases.

  • `U_combined`: The full latent basis [n x (k_shared + sum(k_specific))] that was used to generate the data.