varSel.highest.var.eSRG {MCRestimate}R Documentation

Variable selection and cluster functions

Description

Different functions for a variable selection and clustering methods. These functions are mainly used for the function MCRestimate

Usage

identity(sample.gene.matrix,classfactor,...)
       varSel.highest.t.stat(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=500,...)
       varSel.highest.t.stat.eSRG(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=500,...)

       varSel.highest.var(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=2000,...)
       varSel.highest.var.eSRG(sample.gene.matrix,classfactor,theParameter=NULL,var.numbers=2000,...)

       varSel.green.int.max.eSRG(sample.gene.matrix,classfactor,theParameter=NULL,lambda=0.5,...)
       varSel.green.int.sec.eSRG(sample.gene.matrix,classfactor,theParameter=NULL, lambda=0.5,...)

       varSel.AUC(sample.gene.matrix, classfactor, theParameter=NULL,var.numbers=200,...)
       cluster.kmeans.mean(sample.gene.matrix,classfactor,theParameter=NULL,number.clusters=500,...)

       varSel.removeManyNA(sample.gene.matrix,classfactor, theParameter=NULL, NAthreshold=0.25,...)
       varSel.impute.NA(sample.gene.matrix ,classfactor,theParameter=NULL,...)

       varSel.svm.rfe(sample.gene.matrix, classfactor,theParameter=NULL, ...)

Arguments

sample.gene.matrix a matrix in which the rows corresponds to genes and the colums corresponds to samples
classfactor a factor containing the values that should be predicted
theParameter Parameter that depends on the function. For 'cluster.kmeans.mean' eighter NULL or an output of the function kmeans. If it is NULL then kmeans will be used to form clusters of the genes. Otherwise the already existing clusters will be used. In both ways there will be a calculation of the metagene intensities afterwards. For the other functions eighter NULL or a logical vector which indicates for every gene if it sould be left out from further analysis or not
number.clusters parameter which specifies the number of clusters
var.numbers some methods needs an argument which specifies how many variables should be taken
lambda additional parameter for some methods
NAthreshold integer- if the percentage of the NA is higher than this threshold the variable will be deleted
... Further parameters

Details

metagene.kmeans.mean performes a kmeans clustering with a number of clusters specified by 'number clusters' and takes the mean of each cluster. varSel.highest.var selects a number (specified by 'var.numbers') of variables with the highest variance. varSel.AUC chooses the most discriminating variables due to the AUC criterium (the library ROC is required). Some variable selection functions only work with an MCRestimate.exprSetRG( name ends with .eSRG).and others only work with MCRestimate.default (no .eSRG). varSel.svm.rfe makes feature selection by SVM RFE using a linear kernel.The number of selected features is optimised by internal CV literature: Guyon et al. (2002) Machine Learning 46, 389-422.

Value

Every function returns a list consisting of two arguments:

matrix the result matrix of the variable redution or the clustering
parameter The parameter which are used to reproduce the algorithm, i.e. a vector which indicates for every gene if it will be left out from further analysis or not if a gene reduction is performed or the output of the function kmeans for the clustering algorithm.

Author(s)

Markus Ruschhaupt mailto:m.ruschhaupt@dkfz.de, Patrick Warnat mailto:p.warnat@dkfz-heidelberg.de

See Also

MCRestimate

Examples

library(MCRestimate)
m <- matrix(c(rnorm(10,2,0.5),rnorm(10,4,0.5),rnorm(10,7,0.5),rnorm(10,2,0.5),rnorm(10,4,0.5),rnorm(10,2,0.5)),ncol=2)
cluster.kmeans.mean(m ,number.clusters=3)

[Package MCRestimate version 1.4.0 Index]