KEstimateFast {pcaMethods} | R Documentation |
This is a simple estimator for the optimal number of componets
when applying PCA or LLSimpute for missing value estimation.
No cross validation is performed, instead the estimation quality is defined
as Matrix[!missing] - Estimate[!missing]. This will give a relatively rough
estimate, but the number of iterations equals the length of the parameter evalPcs.
Does not work with LLSimpute!!
As error measure the NRMSEP (see Feten et. al, 2005) or the Q2 distance is used. The NRMSEP basically normalises the RMSD between original data and estimate by the variable-wise variance. The reason for this is that a higher variance will generally lead to a higher estimation error. If the number of samples is small, the gene - wise variance may become an unstable criterion and the Q2 distance should be used instead. Also if variance normalisation was applied previously.
kEstimateFast(Matrix, method = "ppca", evalPcs = 1:3, em = "nrmsep", verbose = interactive(),...)
Matrix |
matrix – numeric matrix containing observations in rows and
variables in columns |
method |
character – One of ppca | bpca | svdImpute | nipals |
evalPcs |
numeric – The principal components to use for cross validation
or cluster sizes if used with llsImpute.
Should be an array containing integer values, eg. evalPcs = 1:10
or evalPcs = C(2,5,8).The NRMSEP is calculated for each component. |
em |
character – The error measure. This can be nrmsep or q2 |
verbose |
boolean – If TRUE, the NRMSEP and the variance are printed
to the console each iteration. |
... |
Further arguments to pca |
list |
Returns a list with the elements:
|
Wolfram Stacklies
CAS-MPG Partner Institute for Computational Biology, Shanghai, China
wolfram.stacklies@gmail.com
## Load a sample metabolite dataset with 5% missing values (metaboliteData) data(metaboliteData) # Estimate best number of PCs with ppca for component 2:4 esti <- kEstimateFast(t(metaboliteData), method = "ppca", evalPcs = 2:4, em="nrmsep") # Plot the result barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)") # The best k value is: print(esti$minNPcs)