tune {CMA} | R Documentation |
Most classifiers implemented in this package depend on one
or even several hyperparameters (s. details) that should be optimized
to obtain good (and comparable !) results. As tuning scheme, we propose
three fold Cross-Validation on each learningset
(for fixed selected
variables). Note that learningsets
usually do not contain the
complete dataset, so tuning involves a second level of splitting the dataset.
Increasing the number of folds leads to larger datasets (and possibly to higher accuracy),
but also to higher computing times.
For S4 method information, s. link{tune-methods}
tune(X, y, f, learningsets, genesel, genesellist = list(), nbgene, classifier, fold = 3, strat = FALSE, grids = list(), trace = TRUE, ...)
X |
Gene expression data. Can be one of the following:
|
y |
Class labels. Can be one of the following:
|
f |
A two-sided formula, if X is a data.frame . The
left part correspond to class labels, the right to variables. |
learningsets |
An object of class learningsets . May
be missing, then the complete datasets is used as
learning set. |
genesel |
Optional (but usually recommended) object of class
genesel containing variable importance
information for the argument learningsets |
genesellist |
In the case that the argument genesel is missing,
this is an argument list passed to GeneSelection .
If both genesel and genesellist are missing,
no variable selection is performed. |
nbgene |
Number of best genes to be kept for classification, based
on either genesel or the call to GeneSelection
using genesellist . In the case that both are missing,
this argument is not necessary.
note:
|
classifier |
Name of function ending with CMA indicating
the classifier to be used. |
fold |
The number of cross-validation folds used within each learningset .
Default is 3. Increasing fold will lead to higher computing times. |
strat |
Should stratified cross-validation according to the class proportions
in the complete dataset be used ? Default is FALSE . |
grids |
A named list. The names correspond to the arguments to be tuned,
e.g. k (the number of nearest neighbours) for knnCMA ,
or cost for svmCMA . Each element is a numeric
vector defining the grid of candidate values. Of course, several hyperparameters
can be tuned simultaneously (though requiring much time). By
default, grids is an empty list. In that case, a pre-defined
list will be used, s. details. |
trace |
Should progress be traced ? Default is TRUE . |
... |
Further arguments to be passed to classifier , of course
not one of the arguments to be tuned (!). |
The following default settings are used, if the arguments grids
is an empty list:
gbmCMA
n.trees = c(50, 100, 200, 500, 1000)
compBoostCMA
mstop = c(50, 100, 200, 500, 1000)
LassoCMA
norm.fraction = seq(from=0.1, to=0.9, length=9)
ElasticNetCMA
norm.fraction = seq(from=0.1, to=0.9, length=5), lambda2 = 2^{-(5:1)}
plrCMA
lambda = 2^{-4:4}
pls_ldaCMA
comp = 1:10
pls_lrCMA
comp = 1:10
pls_rfCMA
comp = 1:10
rfCMA
mtry = ceiling(c(0.1, 0.25, 0.5, 1, 2)*sqrt(ncol(X))), nodesize = c(1,2,3)
knnCMA
k=1:10
pknnCMA
k = 1:10
scdaCMA
delta = c(0.1, 0.25, 0.5, 1, 2, 5)
pnnCMA
sigma = c(2^{-2:2})
nnetCMA
size = 1:5, decay = c(0, 2^{-(4:1)})
svmCMA
, kernel = "linear"
cost = c(0.1, 1, 5, 10, 50, 100, 500)
svmCMA
, kernel = "radial"
cost = c(0.1, 1, 5, 10, 50, 100, 500), gamma = 2^{-2:2}
svmCMA
, kernel = "polynomial"
cost = c(0.1, 1, 5, 10, 50, 100, 500), degree = 2:4
An object of class tuningresult
The computation time can be enormously high. Note that for each different
learningset
, the classifier must be trained fold
times
number of possible different hyperparameter combinations
times.
E.g. if the number of the learningsets is fifty, fold = 3
and
two hyperparameters (each with 5 candidate values) are tuned, 50x3x25=3750
training iterations are necessary !
Martin Slawski martin.slawski@campus.lmu.de
Anne-Laure Boulesteix http://www.slcmsr.net/boulesteix
tuningresult
, GeneSelection
, classification
## Not run: ### simple example for a one-dimensional grid, using compBoostCMA. ### dataset data(golub) golubY <- golub[,1] golubX <- as.matrix(golub[,-1]) ### learningsets set.seed(111) lset <- GenerateLearningsets(y=golubY, method = "CV", fold=5, strat =TRUE) ### tuning after gene selection with the t.test tuneres <- tune(X = golubX, y = golubY, learningsets = lset, genesellist = list(method = "t.test"), classifier=compBoostCMA, nbgene = 100, grids = list(mstop = c(50, 100, 250, 500, 1000))) ### inspect results show(tuneres) best(tuneres) plot(tuneres, iter = 3) ## End(Not run)