logic.bagging {logicFS}R Documentation

Bagged Logic Regression

Description

A Bagging version of logic regression. Currently available for the classification, the linear regression, and the logistic regression approach of logreg.

Usage

## S3 method for class 'formula':
logic.bagging(formula, data, recdom = TRUE, ...)

## Default S3 method:
logic.bagging(x, y, B = 100, ntrees = 1, nleaves = 8, 
  glm.if.1tree = FALSE, replace = TRUE, sub.frac = 0.632,
  anneal.control = logreg.anneal.control(), oob = TRUE, 
  prob.case = 0.5, importance = TRUE, addMatImp = FALSE,
  rand = NULL, ...)

Arguments

formula an object of class formula describing the model that should be fitted
data a data frame containing the variables in the model. Each column of data must correspond to a binary variable (coded by 0 and 1) or a factor (for details on factors, see recdom) except for the column comprising the response, and each row to an observation. The response must be either binary (coded by 0 and 1) or continuous. If continuous, a linear model is fitted in each of the B iterations of logic.bagging. Otherwise, depending on ntrees (and glm.if.1tree) the classification or the logistic regression approach of logic regression is used
recdom a logical value or vector of length ncol(data) comprising whether a SNP should be transformed into two binary dummy variables coding for a recessive and a dominant effect. If TRUE (logical value), then all factors (variables) with three levels will be coded by two dummy variables as described in make.snp.dummy. Each level of each of the other factors (also factors specifying a SNP that shows only two genotypes) is coded by one indicator variable. If FALSE (logical value), each level of each factor is coded by an indicator variable. If recdom is a logical vector, all factors corresponding to an entry in recdom that is TRUE are assumed to be SNPs and transformed into the two binary variables described above. Each variable that corresponds to an entry of recdom that is TRUE (no matter whether recdom is a vector or a value) must be coded by the integers 1 (coding for the homozygous reference genotype), 2 (heterozygous), and 3 (homozygous variant)
x a matrix consisting of 0's and 1's. Each column must correspond to a binary variable and each row to an observation
y a numeric vector that either contains the class labels (coded by 0 and 1) of the observations if the classification or logistic regression approach of logic regression should be used, or the values of a continuous response if the linear regression approach should be used
B an integer specifying the number of iterations
ntrees an integer indicating how many trees should be used.
For a binary response: If ntrees is larger than 1, the logistic regression approach of logic regreesion will be used. If ntrees is 1, then by default the classification approach of logic regression will be used (see glm.if.1tree.)
For a continuous response: A linear regression model with ntrees trees is fitted in each of the B iterations
nleaves a numeric value specifying the maximum number of leaves used in all trees combined. See the help page of the function logreg of the package LogicReg for details
glm.if.1tree if ntrees is 1 and glm.if.1tree is TRUE the logistic regression approach of logic regression is used instead of the classification approach. Ignored if ntrees is not 1 or the response is not binary
replace should sampling of the cases be done with replacement? If TRUE, a Bootstrap sample of size length(cl) is drawn from the length(cl) observations in each of the B iterations. If FALSE, ceiling(sub.frac * length(cl)) of the observations are drawn without replacement in each iteration
sub.frac a proportion specifying the fraction of the observations that are used in each iteration to build a classification rule if replace = FALSE. Ignored if replace = TRUE
anneal.control a list containing the parameters for simulated annealing. See the help page of logreg.anneal.control in the LogicReg package
oob should the out-of-bag error rate (classification and logistic regression) or the out-of-bag root mean square prediction error (linear regression), respectively, be computed?
prob.case a numeric value between 0 and 1. If the outcome of the logistic regression, i.e. the class probability, for an observation is larger than prob.case, this observations will be classified as case (or 1)
importance should the measure of importance be computed?
addMatImp should the matrix containing the improvements due to the prime implicants in each of the iterations be added to the output? (For each of the prime implicants, the importance is computed by the average over the B improvements.) Must be set to TRUE, if standardized importances should be computed using vim.norm, or if permutation based importances should be computed using vim.perm
rand numeric value. If specified, the random number generator will be set into a reproducible state
... for the formula method, optional parameters to be passed to the low level function logic.bagging.default. Otherwise, ignored

Value

logic.bagging returns an object of class logicBagg containing

logreg.model a list containing the B logic regression models
inbagg a list specifying the B Bootstrap samples
vim an object of class logicFS (if importance = TRUE)
oob.error the out-of-bag error (if oob = TRUE)
... further parameters of the logic regression

Author(s)

Holger Schwender, holger.schwender@udo.edu

References

Ruczinski, I., Kooperberg, C., LeBlanc M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.

Schwender, H., Ickstadt, K. (2007). Identification of SNP Interactions Using Logic Regression. Biostatistics, 9(1), 187-198.

See Also

predict.logicBagg, plot.logicBagg, logicFS

Examples

## Not run: 
 # Load data.
   data(data.logicfs)
   
   # For logic regression and hence logic.bagging, the variables must
   # be binary. data.logicfs, however, contains categorical data 
   # with realizations 1, 2 and 3. Such data can be transformed 
   # into binary data by
   bin.snps<-make.snp.dummy(data.logicfs)
   
   # To speed up the search for the best logic regression models
   # only a small number of iterations is used in simulated annealing.
   my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000)
   
   # Bagged logic regression is then performed by
   bagg.out<-logic.bagging(bin.snps,cl.logicfs,B=20,nleaves=10,
       rand=123,anneal.control=my.anneal)
   
   # The output of logic.bagging can be printed
   bagg.out
   
   # By default, also the importances of the interactions are 
   # computed
   bagg.out$vim
   
   # and can be plotted.
   plot(bagg.out)
   
   # The original variable names are displayed in
   plot(bagg.out,coded=FALSE)
   
   # New observations (here we assume that these observations are
   # in data.logicfs) are assigned to one of the classes by
   predict(bagg.out,data.logicfs)
## End(Not run)

[Package logicFS version 1.10.0 Index]