plgem.fit {plgem}R Documentation

PLGEM Fitting and Evaluation

Description

Function for fitting and evaluating goodness of fit of PLGEM on a ‘data’ ExpressionSet, using the set of replicated samples identified by the ‘fit.condition’ condition of the ‘covariateNumb’ covariate. The range of gene expression values (or protein abundance levels) will be partitioned in ‘p’ intervals, and the model will be fit at the ‘q’-th quantile of standard deviations in each partition.

Usage

  plgem.fit(data, covariateNumb=1, fit.condition=1, p=10, q=0.5,
    fittingEval=FALSE, plot.file=FALSE, verbose=FALSE)

Arguments

data an object of class ExpressionSet; see Details for important information on how the phenoData slot of this object will be interpreted by the function.
covariateNumb integer (or coercible to integer); the covariate used to determine on which samples to fit the PLGEM.
fit.condition integer (or coercible to integer); the condition used for PLGEM fitting, according to the order of unique values of the ‘covariateNumb’ covariate.
p integer (or coercible to integer); number of intervals used to partition the expression value range.
q numeric in [0,1]; the quantile of standard deviation used for PLGEM fitting.
fittingEval logical; if TRUE, the fitting is evaluated generating a diagnostic plot.
plot.file logical; if TRUE, a png file is written on the current working directory.
verbose logical; if TRUE, comments are printed out while running.

Details

plgem.fit fits a Power Law Global Error Model (PLGEM) to an expression set and optionally evaluates the quality of the fit. This PLGEM aims to find the mathematical relationship between standard deviation and mean gene expression values (or protein abundance levels) in a set of replicated microarray (or proteomics) samples, according to the following power law:

ln(modeledSpread) = PLGEMslope * ln(mean) + PLGEMintercept

It has been demonstrated that this model fits to Affymetrix GeneChip datasets, as well as to datasets of normalized spectral counts obtained by mass spectrometry-based proteomics (see References for details).

The ‘covariateNumb’ covariate (the first one by default) of the phenoData of the ExpressionSet ‘data’ is expected to contain the necessary information about the experimental design. The values of this covariate must be sample labels, that have to be identical for samples to be treated as replicates.

plgem.fit returns ‘SLOPE’ and ‘INTERCEPT’ of the above described power law; moreover it returns the Pearson's correlation coefficient (‘DATA.PEARSON’) of ln(mean) vs. ln(sd) in the original data, as well as the adjusted R squared (‘ADJ.R2.MP’) of the PLGEM fitted to the modelling points.

If argument ‘fittingEval’ is TRUE, a graphical control of the goodness of the PLGEM fitting is produced and a plot containing four panels is generated. The top-left panel shows the power law, characterized by ‘SLOPE’ and ‘INTERCEPT’. The top-right panel represents the distribution of model residuals. The bottom-left reports the contour plot of ranked residuals. The bottom-right panel finally shows the relationship between the distribution of observed residuals and the normal distribution. A good fit normally gives a horizontal symmetric rank-plot and a near normal distribution of residuals.

Value

plgem.fit returns a list of five numbers (see Details):

SLOPE the slope of the fitted PLGEM.
INTERCEPT the intercept of the fitted PLGEM.
DATA.PEARSON the Pearson correlation coefficient of the linear model fitted on the original data.
ADJ.R2.MP the adjusted R squared of PLGEM fitted on the modelling points.
FIT.CONDITION the condition used for fitting PLGEM.

Author(s)

Mattia Pelizzola mattia.pelizzola@gmail.com

Norman Pavelka nxp@stowers-institute.org

References

Pavelka N, Pelizzola M, Vizzardelli C, Capozzoli M, Splendiani A, Granucci F, Ricciardi-Castagnoli P. A power law global error model for the identification of differentially expressed genes in microarray data. BMC Bioinformatics. 2004 Dec 17;5:203.; http://www.biomedcentral.com/1471-2105/5/203

Pavelka N, Fournier ML, Swanson SK, Pelizzola M, Ricciardi-Castagnoli P, Florens L, Washburn MP. Statistical similarities between transcriptomics and quantitative shotgun proteomics data. Mol Cell Proteomics. 2007 Nov 19; http://www.mcponline.org/cgi/content/abstract/M700240-MCP200v1

See Also

plgem.obsStn, plgem.resampledStn, plgem.pValue, plgem.deg, run.plgem

Examples

  data(LPSeset)
  LPSfit <- plgem.fit(data=LPSeset, fittingEval=FALSE)
  sapply(LPSfit, function(x) return(as.vector(x)))

[Package plgem version 1.14.0 Index]