MLearn-methods {MLInterfaces} | R Documentation |
unified interface to machine learning methods – new approach (August 2005)
Use of MLInterfaces
methods to date (version 1.1.3) involves a large
number of generics with names indicating the method to be employed.
For example knnB()
is used to apply $k$-nearest neighbors analysis
to an instance of the exprSet
class. In this design, the generic
has to ``know'' about the parameters to the underlying R function implementing
the method of interest, and set defaults. This is a somewhat fragile design,
in that changes to the calling sequences to underlying R functions can break the
interfaces defined here.
A new, fully backwards-compatible design is now being introduced. Here there
is one generic MLearn
. Its parameters are formula
, data
,
method
, and trainInd
, and additional parameters to underlying
implementations of machine learning algorithms are passed through ...{}.
This new design allows use of ordinary formulas and data frames as well as
exprSet
instances.
The machine learning methods accommodated in the new design are described before the examples below.
subset
parameter. Because MLInterfaces
wishes to
inhibit the use of resubstitution estimates of generalization error, all MLInterfaces
procedures impose the requirement of the decomposition of input data into training
and test subsets. If you want the behavior of a subset
parameter setting,
please form the subset manually prior to invoking MLearn
.
Possible values for method
are described below, under ``Machine learning resources available''.
Parameter trainInd
defines the indices of the records in the input dataset that are
used for training; remaining records are used as a test dataset for evaluation of the fitted
learner.
exprSet
class.
Parameter formula
is to be the name of a variable in the pData
slot of the exprSet's phenoData
. In general this will be a factor
encoding a categorical variable.
Parameter data
is to be an instance of class exprSet
.
Possible values for method
are described below, under ``Machine learning resources available''.
Parameter trainInd
defines the indices of the records in the input dataset that are
used for training; remaining records are used as a test dataset for evaluation of the fitted
learner.
Any additional parameters to be set for method
can be passed in after trainInd
.
For example, if ``nnet''
is supplied as method
, the parameter size
must
be set and passed in.
An instance of class MLOutput-class
.
Here we provide links to tools that may be identified in the method
parameter. Just
use a string naming the method. For each method, we may have a ``Do not pass parameters''
clause, because the interface constructs values of these parameters on the basis of
parameters set in the call to MLearn
. You may (and in some cases must) set
and pass parameters not listed in the ``Do not pass'' list.
data(iris) tinds <- sample(1:150, 45) MLearn(Species~., data=iris, method="nnet", tinds, size=4, decay=.01 ) MLearn(Species~., data=iris, method="knn", tinds ) rfdemo <- MLearn(Species~., data=iris, method="randomForest", tinds, importance=TRUE ) plot(getVarImp(rfdemo)) # genomics examples library(golubEsets) MLearn("ALL.AML", golubMerge[1:50,], "rpart", 1:36 ) MLearn("ALL.AML", golubMerge[1:50,], "knn", 1:36 )