GenerateLearningsets {CMA}R Documentation

Repeated Divisions into learn- and tets sets

Description

Due to very small sample sizes, the classical division learnset/testset does not give accurate information about the classification performance. Therefore, several different divisions should be used and aggregated. The implemented methods are discussed in Braga-Neto and Dougherty (2003) and Molinaro et al. (2005) whose terminology is adopted.

This function is usually the basis for all deeper analyses.

Usage

GenerateLearningsets(n, y, method = c("LOOCV", "CV", "MCCV", "bootstrap"), fold = NULL, niter = NULL, ntrain = NULL, strat = FALSE)

Arguments

n The total number of observations in the available data set. May be missing if y is provided instead.
y A vector of class labels, either numeric or a factor. Must be given if strat=TRUE or n is not specified.
method Which kind of scheme should be used to generate divisions into learning sets and test sets ? Can be one of the following:
"LOOCV"
Leaving-One-Out Cross Validation.
"CV"
(Ordinary) Cross-Validation. Note that fold must as well be specified.
"MCCV"
Monte-Carlo Cross Validation, i.e. random divisions into learning sets with ntrain(s.below) observations and tests sets with ntrain observations.
"bootstrap"
Learning sets are generated by drawing ntrain times with replacement from all observations. Those not drawn not all form the test set.
fold Gives the number of CV-groups. Used only when method="CV"
niter Number of iterations (s.details).
ntrain Number of observations in the learning sets. Used only when method="MCCV" or method="bootstrap".
strat Logical. Should stratified sampling be performed, i.e. the proportion of observations from each class in the learning sets be the same as in the whole data set ?
Does not apply for method = "LOOCV".

Details

  • When method="CV", niter gives the number of times the whole CV-procedure is repeated. The output matrix has then foldxniter rows. When method="MCCV" or method="bootstrap", niter is simply the number of considered learning sets.
  • Note that method="CV",fold=n is equivalent to method="LOOCV".

    Value

    An object of class learningsets

    Author(s)

    Martin Slawski martin.slawski@campus.lmu.de

    Anne-Laure Boulesteix http://www.slcmsr.net/boulesteix

    References

    Braga-Neto, U.M., Dougherty, E.R. (2003).

    Is cross-validation valid for small-sample microarray classification ?

    Bioinformatics, 20(3), 374-380

    Molinaro, A.M., Simon, R., Pfeiffer, R.M. (2005).

    Prediction error estimation: a comparison of resampling methods.

    Bioinformatics, 21(15), 3301-3307

    See Also

    learningsets, GeneSelection, tune, classification

    Examples

    # LOOCV
    loo <- GenerateLearningsets(n=40, method="LOOCV")
    show(loo)
    # five-fold-CV
    CV5 <- GenerateLearningsets(n=40, method="CV", fold=5)
    show(loo)
    # MCCV
    mccv <- GenerateLearningsets(n=40, method = "MCCV", niter=3, ntrain=30)
    show(mccv)
    # Bootstrap
    boot <- GenerateLearningsets(n=40, method="bootstrap", niter=3)
    # stratified five-fold-CV
    set.seed(113)
    classlabels <- sample(1:3, size = 50, replace = TRUE, prob = c(0.3, 0.5, 0.2))
    CV5strat <- GenerateLearningsets(y = classlabels, method="CV", fold=5, strat = TRUE)
    show(CV5strat)
    

    [Package CMA version 1.0.0 Index]