EOC {OCplus} | R Documentation |
EOC
computes and optionally plots the estimated operating characteristics for data from a microarray experiment with two groups of subjects. The false discovery rate (FDR) is estimated based on random permutations of the data and plotted against the cutoff level on the t-statistic; a curve for the classical sensitivity can be added. Different curves for different proportions of non-differentially expressed genes can be compared in the same plot, and the sample size per group can be varied between plots.
FDRp
is the function that does the underlying hard work and requires package multtest
.
EOC(xdat, grp, p0, paired = FALSE, nperm = 25, seed = NULL, plot = TRUE, ...) FDRp(xdat, grp, test = "t.equalvar", p0, nperm, seed)
xdat |
the matrix of expression values, with genes as rows and samples as columns |
grp |
a grouping variable giving the class membership of each sample, i.e. each column in xdat ; for EOC , this can be any type of variable, as long as it has exactly two distinct values, whereas FDRp expects to see only 0s and 1s, see Details. |
p0 |
if supplied, an estimate for the proportion of non-differentially expressed genes; if not supplied, the routine will estimate it, see Details. |
paired |
logical value indicating whether this is independent sample situation (default) or a paired sample situation. Note that paired samples need to follow each other in the data matrix (as in 010101... |
nperm |
number of permutations for establishing the null distribution of the t-statistic |
test |
the type of test to use, see mt.teststat ; when called from EOC , this is always the default. |
seed |
the random seed from which the permutations are started |
plot |
logical value indicating whether to do the plot |
... |
graphical parameters, passed to plot.FDR.result |
EOC
is the empirical counterpart of the function TOC
. It estimates the FDR and sensitivity for a given data set of expression values measured on subjects in two groups. The FDR is estimated locally based on the empirical Bayes approach outlined by Efron et al., see References. FDRp
implements the details of this method; this requires among other things the permutation distribution of the t-statistic, which is calculated via a call to function mt.teststat
of package multtest
. This explains why both functions barf at missing values in the expression data.
Note that p0
is by default estimated from the data, as originally suggested by Efron et al. so as to make ratio between the densities of the observed distribution of t-statistics and the permutation distribution smaller than 1; alternatively, the user can supply his own guesstimate of the proportion of non-differentially expressed genes in the data.
Note also that FDRp
keeps all permuations in the memory during compuations. For a large number of genes, this will limit the number of possible permuations.
For EOC
, an object of class FDR.result
, which inherits from class data.frame
. The three columns list for each gene its t-statistic, the estimated FDR (two-sided), and the estimated sensitivity. Additionally, the object carries an attribute param
, which is a list with four entries: p0
, the assumed proportion of non-differentially expressed genes used in calculating the FDR; p0.est
, a logical value indicating whether p0
was estimated or user-supplied; statistic
indicates how the t-statistic was computed, i.e. how its sign should be interpreted in terms of relative over- or under expression, and a logical flag paired
to indicate whether a paired t-statistic was used.
FDRp
returns a list with essentially the same elements, plus additionally the values of the observed and permuted distribution of the t-statistics for each gene.
Both the curve labels and the legend may be squashed if the plotting device is too small. Increasing the size of the device and re-plotting should improve readability.
Y. Pawitan and A. Ploner
Pawitan Y, Michiels S, Koscielny S, Gusnanto A, Ploner A (2005) False Discovery Rate, Sensitivity and Sample Size for Microarray Studies. Bioinformatics, 21, 3017-3024.
Efron B, Tibshirani R, Storey JD, Tusher V. (2001) Empirical Bayes Analysis of a Microarray Experiment. JASA, 96(456), p. 1151-60.
plot.FDR.result
, OCshow
, mt.teststat
# We simulate a small example with 5 percent regulated genes and # a rather large effect size set.seed(2003) xdat = matrix(rnorm(50000), nrow=1000) xdat[1:25, 1:25] = xdat[1:25, 1:25] - 2 xdat[26:50, 1:25] = xdat[26:50, 1:25] + 2 grp = rep(c("Sample A","Sample B"), c(25,25)) # The default, with legend ret = EOC(xdat, grp, legend=TRUE) # Look at the results: yes ret[1:10,] which(ret$FDR<0.05) # Extra information attr(ret,"param") # Run the same data with different permutations: fairly stable, but with # different p0 ret = EOC(xdat, grp, seed=2000) which(ret$FDR<0.07) # Misspecify the p0: not too bad here ret = EOC(xdat, grp, p0=0.99) which(ret$FDR<0.01) # We simulate data in a paired setting # Note the arrangement of the columns set.seed(2004) xdat = matrix(rnorm(50000), nrow=1000) ndx1 = seq(1,50, by=2) xdat[1:25, ndx1] = xdat[1:25, ndx1] - 2 xdat[26:50, ndx1] = xdat[26:50, ndx1] + 2 grp = rep(c("Sample A","Sample B"), 25) ret = EOC(xdat, grp, paired=TRUE) which(ret$FDR<0.05)