GenerateLearningsets {CMA} | R Documentation |
Due to very small sample sizes, the classical division learnset/testset does not give accurate information about the classification performance. Therefore, several different divisions should be used and aggregated. The implemented methods are discussed in Braga-Neto and Dougherty (2003) and Molinaro et al. (2005) whose terminology is adopted.
This function is usually the basis for all deeper analyses.
GenerateLearningsets(n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"), fold = NULL, niter = NULL, ntrain = NULL, strat = FALSE)
n |
The total number of observations in the available data set. May be missing
if y is provided instead. |
y |
A vector of class labels, either numeric or a factor .
Must be given if strat=TRUE or n is not specified. |
method |
Which kind of scheme should be used to generate divisions
into learning sets and test sets ? Can be one of the following:
|
fold |
Gives the number of CV-groups. Used only when method="CV" |
niter |
Number of iterations (s.details). |
ntrain |
Number of observations in the learning sets. Used
only when method="MCCV" or method="bootstrap" . |
strat |
Logical. Should stratified sampling be performed,
i.e. the proportion of observations from each class in the learning
sets be the same as in the whole data set ?
Does not apply for method = "LOOCV" . |
method="CV"
, niter
gives the number of times
the whole CV-procedure is repeated. The output matrix has then fold
xniter
rows.
When method="MCCV"
or method="bootstrap"
, niter
is simply the number of considered
learning sets.
method="CV",fold=n
is equivalent to method="LOOCV"
.
An object of class learningsets
Martin Slawski martin.slawski@campus.lmu.de
Anne-Laure Boulesteix http://www.slcmsr.net/boulesteix
Braga-Neto, U.M., Dougherty, E.R. (2003).
Is cross-validation valid for small-sample microarray classification ?
Bioinformatics, 20(3), 374-380
Molinaro, A.M., Simon, R., Pfeiffer, R.M. (2005).
Prediction error estimation: a comparison of resampling methods.
Bioinformatics, 21(15), 3301-3307
learningsets
, GeneSelection
, tune
,
classification
# LOOCV loo <- GenerateLearningsets(n=40, method="LOOCV") show(loo) # five-fold-CV CV5 <- GenerateLearningsets(n=40, method="CV", fold=5) show(loo) # MCCV mccv <- GenerateLearningsets(n=40, method = "MCCV", niter=3, ntrain=30) show(mccv) # Bootstrap boot <- GenerateLearningsets(n=40, method="bootstrap", niter=3) # stratified five-fold-CV set.seed(113) classlabels <- sample(1:3, size = 50, replace = TRUE, prob = c(0.3, 0.5, 0.2)) CV5strat <- GenerateLearningsets(y = classlabels, method="CV", fold=5, strat = TRUE) show(CV5strat)