generate_structured_multiview_data.Rd
This function creates simulated multi-view data with a complex and realistic ground truth, containing both shared and modality-specific latent signals.
generate_structured_multiview_data(
n_subjects,
n_features,
k_shared,
k_specific,
noise_sd = 0.1,
sparsity_level = 0.5,
seed = NULL
)
The number of subjects (rows).
A vector of integers specifying the number of features for each view.
The number of latent components common to ALL views.
A vector of integers specifying the number of unique latent components for EACH view. If a single number is given, it's recycled.
The standard deviation of the Gaussian noise added to the data.
The proportion of elements in the shared loading matrix (V_f) to set to zero. A value between 0 and 1. Defaults to 0.5.
An optional random seed for reproducibility.
A list containing:
A named list of the final [n x p] data matrices.
A list containing the true latent structures:
`U_shared`: The ground truth shared basis [n x k_shared]. This is what `simlr` should aim to recover.
`V_shared`: A list of the ground truth shared loading matrices [p x k_shared]. This is what the `simlr` `v` matrices should be compared against.
`U_specific`: A list of the ground truth modality-specific bases.
`U_combined`: The full latent basis [n x (k_shared + sum(k_specific))] that was used to generate the data.