logicFS {logicFS}R Documentation

Feature Selection with Logic Regression

Description

Identification of interesting interactions between binary variables using logic regression. Currently only the classification and the logistic regression approach of logreg are available.

Usage

## S3 method for class 'formula':
logicFS(formula, data, ...)

## Default S3 method:
logicFS(x, y, B = 100, ntrees = 1, nleaves = 8, glm.if.1tree = FALSE, 
  replace = TRUE, sub.frac = 0.632, anneal.control = logreg.anneal.control(), 
  prob.case = 0.5, addMatImp = TRUE, rand = NULL, ...)

Arguments

formula an object of class formula describing the model that should be fitted
data a data frame containing the variables in the model. Each column of data must correspond to a binary variable (coded by 0 and 1), and each row to an observation
x a matrix consisting of 0's and 1's. Each column must correspond to a binary variable and each row to an observation
y a vector of 0's and 1's containing the class labels of the observations
B an integer specifying the number of iterations
ntrees an integer indicating how many trees should be used. If ntrees is larger than 1, the logistic regression approach of logic regreesion will be used. If ntrees is 1, then by default the classification approach of logic regression will be used (see glm.if.1tree)
nleaves a numeric value specifying the maximum number of leaves used in all trees combined. For details, see the help page of the function logreg of the package LogicReg
glm.if.1tree if ntrees is 1 and glm.if.1tree is TRUE the logistic regression approach of logic regression is used instead of the classification approach. Ignored if ntrees is not 1
replace should sampling of the cases be done with replacement? If TRUE, a Bootstrap sample of size length(cl) is drawn from the length(cl) observations in each of the B iterations. If FALSE, ceiling(sub.frac * length(cl)) of the observations are drawn without replacement in each iteration
sub.frac a proportion specifying the fraction of the observations that are used in each iteration to build a classification rule if replace = FALSE. Ignored if replace = TRUE
anneal.control a list containing the parameters for simulated annealing. See the help of the function logreg.anneal.control in the LogicReg package
prob.case a numeric value between 0 and 1. If the outcome of the logistic regression, i.e. the predicted probability, for an observation is larger than prob.case this observations will be classified as case (or 1)
addMatImp should the matrix containing the improvements due to the prime implicants in each of the iterations be added to the output? (For each of the prime implicants, the importance is computed by the average over the B improvements.) Must be set to TRUE, if standardized importances should be computed using vim.norm, or if permutation based importances should be computed using vim.perm
rand numeric value. If specified, the random number generator will be set into a reproducible state
... for the formula method, optional parameters to be passed to the low level function logicFS.default. Otherwise, ignored

Value

An object of class logicFS containing

primes the prime implicants
vim the importance of the prime implicants
prop the proportion of logic regression models that contain the prime implicants
type the type of model (1: classification, 3: logistic regression)
param further parameters (if addInfo = TRUE)
mat.imp the matrix containing the improvements if addMatImp = TRUE, otherwise, NULL
measure the name of the used importance measure
threshold NULL
mu NULL

Author(s)

Holger Schwender, holger.schwender@udo.edu

References

Ruczinski, I., Kooperberg, C., LeBlanc M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.

Schwender, H., Ickstadt, K. (2007). Identification of SNP Interactions Using Logic Regression. To appear in Biostatistics

See Also

plot.logicFS, logic.bagging

Examples

## Not run: 
   # Load data.
   data(data.logicfs)
   
   # For logic regression and hence logic.fs, the variables must
   # be binary. data.logicfs, however, contains categorical data 
   # with realizations 1, 2 and 3. Such data can be transformed 
   # into binary data by
   bin.snps<-make.snp.dummy(data.logicfs)
   
   # To speed up the search for the best logic regression models
   # only a small number of iterations is used in simulated annealing.
   my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000)
   
   # Feature selection using logic regression is then done by
   log.out<-logicFS(bin.snps,cl.logicfs,B=20,nleaves=10,
       rand=123,anneal.control=my.anneal)
   
   # The output of logic.fs can be printed
   log.out
   
   # One can specify another number of interactions that should be
   # printed, here, e.g., 15.
   print(log.out,topX=15)
   
   # The variable importance can also be plotted.
   plot(log.out)
   
   # And the original variable names are displayed in
   plot(log.out,coded=FALSE)
## End(Not run)

[Package logicFS version 1.8.0 Index]