fs.absT {MLInterfaces}R Documentation

support for feature selection in cross-validation

Description

support for feature selection in cross-validation

Usage

fs.absT(N)
fs.probT(p)
fs.topVariance(p)

Arguments

N number of features to retain; features are ordered by descending value of abs(two-sample t stat.), and the top N are used.
p cumulative probability (in (0,1)) in the distribution of absolute t statistics above which we retain features

Details

This function returns a function that will be used as a parameter to xvalSpec in applications of MLearn.

Value

a function is returned, that will itself return a formula consisting of the selected features for application of MLearn.

Note

The functions fs.absT and fs.probT are two examples of approaches to embedded feature selection that make sense for two-sample prediction problems. For selection based on linear models or other discrimination measures, you will need to create your own selection helper, following the code in these functions as examples.

fs.topVariance excludes features whose marginal variance over all samples does not exceed the pth percentile of the distribution of variances over all features.

Author(s)

VJ Carey <stvjc@channing.harvard.edu>

See Also

MLearn

Examples

# we will demonstrate this procedure with the crabs data.
# first, create the closure to pick 3 features
demFS = fs.absT(3)
# run it on the entire dataset with features excluding sex
demFS(sp~.-sex, crabs)
# emulate cross-validation by excluding last 50 records
demFS(sp~.-sex, crabs[1:150,])
# emulate cross-validation by excluding first 50 records -- different features retained
demFS(sp~.-sex, crabs[51:200,])

[Package MLInterfaces version 1.14.1 Index]