GeneSelection {CMA} | R Documentation |
For different learning data sets as defined by the argument learningsets
,
this method ranks the genes from the most relevant to the less relevant using
one of various 'filter' criteria or provides a sparse collection of variables
(Lasso, ElasticNet, Boosting). The results are typically used for variable selection for
the classification procedure that follows.
For S4 class information, s. GeneSelection-methods
.
GeneSelection(X, y, f, learningsets, method = c("t.test", "welch.test", "wilcox.test", "f.test", "kruskal.test", "limma", "rfe", "rf", "lasso", "elasticnet", "boosting", "golub"), scheme, trace = TRUE, ...)
X |
Gene expression data. Can be one of the following:
|
y |
Class labels. Can be one of the following:
|
f |
A two-sided formula, if X is a data.frame . The
left part correspond to class labels, the right to variables. |
learningsets |
An object of class learningsets . May
be missing, then the complete datasets is used as
learning set. |
method |
A character specifying the method to be used:
... argument.boosting compBoostCMA
Take care that appropriate hyperparameters are passed by the ... argument.golub golub . |
scheme |
The scheme to be used in the case of a non-binary response. Must be one
of "pairwise" ,"one-vs-all" or "multiclass" . The
last case only makes sense if method is one of f.test, limma, rf, boosting ,
which can directly be applied to the multi class case. |
trace |
Should the progress be traced ? Default is TRUE . |
... |
Further arguments passed to the function performing variable selection, s. method . |
An object of class genesel
.
most of the methods described above are only apt for the binary classification case. The only ones that can be used without restriction in the multiclass case are
f.test
kruskal.test
rf
boosting
For the rest, pairwise or one-vs-all schemes are used.
Martin Slawski martin.slawski@campus.lmu.de
Anne-Laure Boulesteix http://www.slcmsr.net/boulesteix
Smyth, G. K., Yang, Y.-H., Speed, T. P. (2003).
Statistical issues in microarray data analysis.
Methods in Molecular Biology 224, 111-136.
Guyon, I., Weston, J., Barnhill, S., Vapnik, V. (2002).
Gene Selection for Cancer Classification using support vector machines. Journal of Machine Learning Research, 46, 389-422
Zhou, H., Hastie, T. (2004).
Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2),301-320
Buelmann, P., Yu, B. (2003).
Boosting with the L2 loss: Regression and Classification.
Journal of the American Statistical Association, 98, 324-339
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R. (2004).
Least Angle Regression.
Annals of Statistics, 32:407-499
Buehlmann, P., Yu, B. (2006).
Sparse Boosting.
Journal of Machine Learning Research, 7- 1001:1024
filter
, GenerateLearningsets
, tune
,
classification
# load Golub AML/ALL data data(golub) ### extract class labels golubY <- golub[,1] ### extract gene expression from first 10 genes golubX <- as.matrix(golub[,-1]) ### Generate five different learningsets set.seed(111) five <- GenerateLearningsets(y=golubY, method = "CV", fold = 5, strat = TRUE) ### simple t-test: selttest <- GeneSelection(golubX, golubY, learningsets = five, method = "t.test") ### show result: show(selttest) toplist(selttest, k = 10, iter = 1) plot(selttest, iter = 1)