logicFS {logicFS} | R Documentation |
Identification of interesting interactions between binary variables
using logic regression. Currently available for the classification, the linear
regression and the
logistic regression approach of logreg
are available.
## S3 method for class 'formula': logicFS(formula, data, recdom = TRUE, ...) ## Default S3 method: logicFS(x, y, B = 100, ntrees = 1, nleaves = 8, glm.if.1tree = FALSE, replace = TRUE, sub.frac = 0.632, anneal.control = logreg.anneal.control(), prob.case = 0.5, addMatImp = TRUE, rand = NULL, ...)
formula |
an object of class formula describing the model that should be
fitted |
data |
a data frame containing the variables in the model. Each column of data
must correspond to a binary variable (coded by 0 and 1) or a factor (for details, see
recdom ) except for the column comprising
the response, and each row to an observation. The response must be either binary (coded by
0 and 1) or continuous. If continuous, a linear model is fitted in each of the B iterations of
logic.bagging . Otherwise, depending on ntrees (and glm.if.1tree )
the classification or the logistic regression approach of logic regression is used |
recdom |
a logical value or vector of length ncol(data) comprising whether a SNP should
be transformed into two binary dummy variables coding for a recessive and a dominant effect.
If TRUE (logical value), then all factors (variables) with three levels will be coded by two dummy
variables as described in make.snp.dummy . Each level of each of the other factors
(also factors specifying a SNP that shows only two genotypes) is coded by one indicator variable.
If FALSE (logical value),
each level of each factor is coded by an indicator variable. If recdom is a logical vector,
all factors corresponding to an entry in recdom that is TRUE are assumed to be SNPs
and transformed into the two binary variables described above. Each variable that corresponds
to an entry of recdom that is TRUE (no matter whether recdom is a vector or a value)
must be coded by the integers 1 (coding for the homozygous reference genotype), 2 (heterozygous),
and 3 (homozygous variant) |
x |
a matrix consisting of 0's and 1's. Each column must correspond to a binary variable and each row to an observation |
y |
a numeric vector that either contains the class labels (coded by 0 and 1) of the observations if the classification or logistic regression approach of logic regression should be used, or the values of a continuous response if the linear regression approach should be used |
B |
an integer specifying the number of iterations |
ntrees |
an integer indicating how many trees should be used.
For a binary response: If ntrees
is larger than 1, the logistic regression approach of logic regreesion
will be used. If ntrees is 1, then by default the classification
approach of logic regression will be used (see glm.if.1tree .)
For a continuous response: A linear regression model with ntrees trees
is fitted in each of the B iterations |
nleaves |
a numeric value specifying the maximum number of leaves used
in all trees combined. For details, see the help page of the function logreg of
the package LogicReg |
glm.if.1tree |
if ntrees is 1 and glm.if.1tree is TRUE
the logistic regression approach of logic regression is used instead of
the classification approach. Ignored if ntrees is not 1, or the response is not binary |
replace |
should sampling of the cases be done with replacement? If
TRUE , a Bootstrap sample of size length(cl) is drawn
from the length(cl) observations in each of the B iterations. If
FALSE , ceiling(sub.frac * length(cl)) of the observations
are drawn without replacement in each iteration |
sub.frac |
a proportion specifying the fraction of the observations that
are used in each iteration to build a classification rule if replace = FALSE .
Ignored if replace = TRUE |
anneal.control |
a list containing the parameters for simulated annealing.
See the help of the function logreg.anneal.control in the LogicReg package |
prob.case |
a numeric value between 0 and 1. If the outcome of the
logistic regression, i.e. the predicted probability, for an observation is
larger than prob.case this observations will be classified as case
(or 1) |
addMatImp |
should the matrix containing the improvements due to the prime implicants
in each of the iterations be added to the output? (For each of the prime implicants,
the importance is computed by the average over the B improvements.) Must be
set to TRUE , if standardized importances should be computed using
vim.norm , or if permutation based importances should be computed
using vim.perm |
rand |
numeric value. If specified, the random number generator will be set into a reproducible state |
... |
for the formula method, optional parameters to be passed to the low level function
logicFS.default . Otherwise, ignored |
An object of class logicFS
containing
primes |
the prime implicants |
vim |
the importance of the prime implicants |
prop |
the proportion of logic regression models that contain the prime implicants |
type |
the type of model (1: classification, 2: linear regression, 3: logistic regression) |
param |
further parameters (if addInfo = TRUE ) |
mat.imp |
the matrix containing the improvements if addMatImp = TRUE ,
otherwise, NULL |
measure |
the name of the used importance measure |
threshold |
NULL |
mu |
NULL |
Holger Schwender, holger.schwender@udo.edu
Ruczinski, I., Kooperberg, C., LeBlanc M.L. (2003). Logic Regression. Journal of Computational and Graphical Statistics, 12, 475-511.
Schwender, H., Ickstadt, K. (2007). Identification of SNP Interactions Using Logic Regression. Biostatistics, 9(1), 187-198.
## Not run: # Load data. data(data.logicfs) # For logic regression and hence logic.fs, the variables must # be binary. data.logicfs, however, contains categorical data # with realizations 1, 2 and 3. Such data can be transformed # into binary data by bin.snps<-make.snp.dummy(data.logicfs) # To speed up the search for the best logic regression models # only a small number of iterations is used in simulated annealing. my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000) # Feature selection using logic regression is then done by log.out<-logicFS(bin.snps,cl.logicfs,B=20,nleaves=10, rand=123,anneal.control=my.anneal) # The output of logic.fs can be printed log.out # One can specify another number of interactions that should be # printed, here, e.g., 15. print(log.out,topX=15) # The variable importance can also be plotted. plot(log.out) # And the original variable names are displayed in plot(log.out,coded=FALSE) ## End(Not run)