pdmClass {pdmclass}R Documentation

Function to Classify Microarray Data using Penalized Discriminant Methods

Description

This function is used to classify microarray data. Since the underlying model fit is based on penalized discriminant methods, there is no need for a pre-filtering step to reduce the number of genes.

Usage

pdmClass(formula = formula(data), method = c("pls", "pcr", "ridge"),
data = sys.frame(sys.parent()), weights, theta, dimension = J - 1,
eps = .Machine$double.eps, ...)

Arguments

formula A symbolic description of the model to be fit. Details given below.
method One of "pls", "pcr", "ridge", corresponding to partial least squares, principal components regression and ridge regression.
data An optional data.frame that contains the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which pdmClass is called. Note that unlike most microarray analyses, in this case rows are samples and columns are genes.
weights An optional vector of sample weights. Defaults to 1.
theta An optional matrix of class scores, typically with less than J - 1 columns.
dimension The dimension of the solution. This will be no greater than J - 1 for partial least squares and ridge regression, and no greater than J for principal components regression. Defaults to J - 1 and J, respectively.
eps A threshold for excluding small discriminant variables. Defaults to .Machine$double.eps.
... Additional parameters to pass to method.

Details

The formula interface is identical to all other formula calls in R, namely Y ~ X, where Y is a numeric vector of class assignments and X is a matrix or data.frame containing the gene expression values. Note that unlike most microarray analyses, in this instance the columns of X are genes and rows are samples, so most calls will require something similar to Y ~ t(X).

Value

an object of class "fda". Use predict to extract discriminant variables, posterior probabilities or predicted class memberships. Other extractor functions are coef, and plot.
The object has the following components:

percent.explained the percent between-group variance explained by each dimension (relative to the total explained.)
values optimal scaling regresssion sum-of-squares for each dimension (see reference). The usual discriminant analysis eigenvalues are given by values / (1-values), which are used to define percent.explained.
means class means in the discriminant space. These are also scaled versions of the final theta's or class scores, and can be used in a subsequent call to fda (this only makes sense if some columns of theta are omitted—see the references).
theta.mod (internal) a class scoring matrix which allows predict to work properly.
dimension dimension of discriminant space.
prior class proportions for the training data.
fit fit object returned by method.
call the call that created this object (allowing it to be update-able)

Author(s)

James W. MacDonald and Debashis Ghosh, based on fda in the mda package of Trevor Hastie and Robert Tibshirani, which was ported to R by Kurt Hornik, Brian D. Ripley, and Friedrich Leisch.

References

http://www.sph.umich.edu/~ghoshd/COMPBIO/POPTSCORE

"Flexible Disriminant Analysis by Optimal Scoring" by Hastie, Tibshirani and Buja, 1994, JASA, 1255-1270.

"Penalized Discriminant Analysis" by Hastie, Buja and Tibshirani, Annals of Statistics, 1995 (in press).

Examples

library(fibroEset)
data(fibroEset)
y <- as.factor(pData(fibroEset)[,2])
x <- t(exprs(fibroEset))
pdmClass(y ~ x)

[Package pdmclass version 1.2.0 Index]