MLearn-methods {MLInterfaces}R Documentation

unified interface to machine learning methods

Description

unified interface to machine learning methods – new approach (August 2005)

Introduction

Use of MLInterfaces methods to date (version 1.1.3) involves a large number of generics with names indicating the method to be employed. For example knnB() is used to apply $k$-nearest neighbors analysis to an instance of the exprSet class. In this design, the generic has to ``know'' about the parameters to the underlying R function implementing the method of interest, and set defaults. This is a somewhat fragile design, in that changes to the calling sequences to underlying R functions can break the interfaces defined here.

A new, fully backwards-compatible design is now being introduced. Here there is one generic MLearn. Its parameters are formula, data, method, and trainInd, and additional parameters to underlying implementations of machine learning algorithms are passed through ...{}. This new design allows use of ordinary formulas and data frames as well as exprSet instances.

The machine learning methods accommodated in the new design are described before the examples below.

Methods

formula = "formula", data = "data.frame", method = "character", trainInd = "numeric"
The behavior with this signature is comparable to that of the standard R modeling tools, with the exception of the handling of the common subset parameter. Because MLInterfaces wishes to inhibit the use of resubstitution estimates of generalization error, all MLInterfaces procedures impose the requirement of the decomposition of input data into training and test subsets. If you want the behavior of a subset parameter setting, please form the subset manually prior to invoking MLearn.

Possible values for method are described below, under ``Machine learning resources available''.

Parameter trainInd defines the indices of the records in the input dataset that are used for training; remaining records are used as a test dataset for evaluation of the fitted learner.

formula = "character", data = "exprSet", method = "character", trainInd = "numeric"
This method works for instances of the exprSet class.

Parameter formula is to be the name of a variable in the pData slot of the exprSet's phenoData. In general this will be a factor encoding a categorical variable.

Parameter data is to be an instance of class exprSet.

Possible values for method are described below, under ``Machine learning resources available''.

Parameter trainInd defines the indices of the records in the input dataset that are used for training; remaining records are used as a test dataset for evaluation of the fitted learner.

Any additional parameters to be set for method can be passed in after trainInd. For example, if ``nnet'' is supplied as method, the parameter size must be set and passed in.

Value

An instance of class MLOutput-class.

Machine learning resources available

Here we provide links to tools that may be identified in the method parameter. Just use a string naming the method. For each method, we may have a ``Do not pass parameters'' clause, because the interface constructs values of these parameters on the basis of parameters set in the call to MLearn. You may (and in some cases must) set and pass parameters not listed in the ``Do not pass'' list.

Examples

data(iris)
tinds <- sample(1:150, 45)
MLearn(Species~., data=iris, method="nnet", tinds, size=4, decay=.01 )
MLearn(Species~., data=iris, method="knn", tinds )
rfdemo <- MLearn(Species~., data=iris, method="randomForest", tinds, importance=TRUE )
plot(getVarImp(rfdemo))
# genomics examples
library(golubEsets)
MLearn("ALL.AML", golubMerge[1:50,], "rpart", 1:36 )
MLearn("ALL.AML", golubMerge[1:50,], "knn", 1:36 )

[Package MLInterfaces version 1.2.1 Index]