plgem.fit {plgem} | R Documentation |
Function for fitting and evaluating goodness of fit of PLGEM on a
‘data’ ExpressionSet
, using the set of replicated samples
identified by the ‘fit.condition’ condition of the
‘covariateNumb’ covariate. The range of gene expression values (or
protein abundance levels) will be partitioned in ‘p’ intervals, and the
model will be fit at the ‘q’-th quantile of standard deviations in each
partition.
plgem.fit(data, covariateNumb=1, fit.condition=1, p=10, q=0.5, fittingEval=FALSE, plot.file=FALSE, verbose=FALSE)
data |
an object of class ExpressionSet ; see Details for important
information on how the phenoData slot of this object will be
interpreted by the function. |
covariateNumb |
integer (or coercible to integer ); the
covariate used to determine on which samples to fit the PLGEM. |
fit.condition |
integer (or coercible to integer ); the
condition used for PLGEM fitting, according to the order of unique
values of the ‘covariateNumb’ covariate. |
p |
integer (or coercible to integer ); number of intervals
used to partition the expression value range. |
q |
numeric in [0,1]; the quantile of standard deviation used for
PLGEM fitting. |
fittingEval |
logical ; if TRUE , the fitting is evaluated
generating a diagnostic plot. |
plot.file |
logical ; if TRUE , a png file is written on the
current working directory. |
verbose |
logical ; if TRUE , comments are printed out while
running. |
plgem.fit
fits a Power Law Global Error Model (PLGEM) to an
expression set and optionally evaluates the quality of the fit. This
PLGEM aims to find the mathematical relationship between standard
deviation and mean gene expression values (or protein abundance levels) in a
set of replicated microarray (or proteomics) samples, according to the
following power law:
ln(modeledSpread) = PLGEMslope * ln(mean) + PLGEMintercept
It has been demonstrated that this model fits to Affymetrix GeneChip datasets, as well as to datasets of normalized spectral counts obtained by mass spectrometry-based proteomics (see References for details).
The ‘covariateNumb’ covariate (the first one by default) of the
phenoData
of the ExpressionSet
‘data’ is expected to
contain the necessary information about the experimental design. The values of
this covariate must be sample labels, that have to be identical for samples to
be treated as replicates.
plgem.fit
returns ‘SLOPE’ and ‘INTERCEPT’ of the above
described power law; moreover it returns the Pearson's correlation coefficient
(‘DATA.PEARSON’) of ln(mean) vs. ln(sd) in the original data, as well
as the adjusted R squared (‘ADJ.R2.MP’) of the PLGEM fitted to
the modelling points.
If argument ‘fittingEval’ is TRUE
, a graphical control of the
goodness of the PLGEM fitting is produced and a plot containing four
panels is generated. The top-left panel shows the power law, characterized by
‘SLOPE’ and ‘INTERCEPT’. The top-right panel represents the
distribution of model residuals. The bottom-left reports the contour plot of
ranked residuals. The bottom-right panel finally shows the relationship
between the distribution of observed residuals and the normal distribution.
A good fit normally gives a horizontal symmetric rank-plot and a near normal
distribution of residuals.
plgem.fit
returns a list of five numbers (see Details):
SLOPE |
the slope of the fitted PLGEM. |
INTERCEPT |
the intercept of the fitted PLGEM. |
DATA.PEARSON |
the Pearson correlation coefficient of the linear model fitted on the original data. |
ADJ.R2.MP |
the adjusted R squared of PLGEM fitted on the modelling points. |
FIT.CONDITION |
the condition used for fitting PLGEM. |
Mattia Pelizzola mattia.pelizzola@gmail.com
Norman Pavelka nxp@stowers-institute.org
Pavelka N, Pelizzola M, Vizzardelli C, Capozzoli M, Splendiani A, Granucci F, Ricciardi-Castagnoli P. A power law global error model for the identification of differentially expressed genes in microarray data. BMC Bioinformatics. 2004 Dec 17;5:203.; http://www.biomedcentral.com/1471-2105/5/203
Pavelka N, Fournier ML, Swanson SK, Pelizzola M, Ricciardi-Castagnoli P, Florens L, Washburn MP. Statistical similarities between transcriptomics and quantitative shotgun proteomics data. Mol Cell Proteomics. 2007 Nov 19; http://www.mcponline.org/cgi/content/abstract/M700240-MCP200v1
plgem.obsStn
, plgem.resampledStn
,
plgem.pValue
, plgem.deg
, run.plgem
data(LPSeset) LPSfit <- plgem.fit(data=LPSeset, fittingEval=FALSE) sapply(LPSfit, function(x) return(as.vector(x)))