find.a0 {siggenes} | R Documentation |
Suggests an optimal value for the fudge factor in an EBAM analysis as proposed by Efron et al. (2001).
find.a0(data, cl, method = z.find, B = 100, delta = 0.9, quan.a0 = (0:5)/5, include.zero = TRUE, gene.names = dimnames(data)[[1]], n.chunk = 5, n.interval = 139, df.ratio = NULL, p0.estimation = c("splines", "adhoc", "interval"), lambda = NULL, ncs.value = "max", use.weights = FALSE, rand = NA, ...)
data |
a matrix, data frame or an ExpressionSet object.
Each row of data (or exprs(data) , respectively) must
correspond to a gene, and each column to a sample |
cl |
a numeric vector of length ncol(data) containing the class
labels of the samples. In the two class paired case, cl can also
be a matrix with ncol(data) rows and 2 columns. If data is
an ExpressionSet object, cl can also be a character string naming
the column of pData(data) that contains the class labels of the samples.
In the one-class case, cl should be a vector of 1's.
In the two class unpaired case, cl should be a vector containing 0's
(specifying the samples of, e.g., the control group) and 1's (specifying,
e.g., the case group).
In the two class paired case, cl can be either a numeric vector or
a numeric matrix. If it is a vector, then cl has to consist of the
integers between -1 and -n/2 (e.g., before treatment group) and between
1 and n/2 (e.g., after treatment group), where n is the length of
cl and k is paired with -k, k=1,...,n/2. If cl
is a matrix, one column should contain -1's and 1's specifying, e.g., the before
and the after treatment samples, respectively, and the other column should
contain integer between 1 and n/2 specifying the n/2 pairs of
observations.
In the multiclass case and if method=cat.stat , cl should be a
vector containing integers between 1 and g, where g is the number
of groups.
For examples of how cl can be specified, see the manual of siggenes |
method |
the name of a function for computing the numerator and the denominator
of the test statistic of interest, and for specifying other objects required
for the identification of the fudge factor. The default function z.find
provides these objects for t- and F-statistics. It is, however, also possible
to employ an user-written function. For how to write such a function, see the
vignette of siggenes |
B |
the number of permutations used in the estimation of the null distribution |
delta |
a probability. All genes showing a posterior probability that is
larger than or equal to delta are called differentially expressed |
quan.a0 |
a numeric vector indicating over which quantiles of the standard deviations of the genes the fudge factor a0 should be optimized |
include.zero |
should a0 = 0, i.e. the not-modified test statistic also be a possible choice for the fudge factor? |
gene.names |
a character vector of length nrow(data) containing the
names of the genes. By default the row names of data are used |
n.chunk |
an integer specifying in how many subsets the B permutations
should be split when computing the permuted test scores |
n.interval |
the number of intervals used in the logistic regression with repeated observations for estimating the ratio f0/f |
df.ratio |
integer specifying the degrees of freedom of the natural cubic spline used in the logistic regression with repeated observations |
p0.estimation |
either "splines" (default), "interval" , or "adhoc" .
If "splines" , the spline based method of Storey and Tibshirani (2003) is used to estimate
p0. If \"adhoc" ("interval") , the adhoc (interval based)
method proposed by Efron et al. (2001) is used to estimate p0 |
lambda |
a numeric vector or value specifying the lambda values used in
the estimation of p0. If NULL , lambda is set to seq(0, 0.95, 0.05)
if p0.estimation = "splines" , and to 0.5 if p0.estimation = "interval" .
Ignored if p0.estimation = "adhoc" . For details, see pi0.est |
ncs.value |
a character string. Only used if p0.estimation = "splines" and
lambda is a vector. Either "max" or "paper" . For details, see
pi0.est |
use.weights |
should weights be used in the spline based estimation of p0? If
TRUE , 1 - lambda is used as weights. For details, see pi0.est |
rand |
integer. If specified, i.e. not NA , the random number generator
will be set into a reproducible state |
... |
further arguments for the function specified by fun . For
further arguments of fun = z.find , see z.find |
The suggested choice for the fudge factor is the value of a0 that
leads to the largest number of genes showing a posterior probability larger
than delta
.
Actually, only the genes having a posterior probability larger than delta
are called differentially expressed that do not exhibit a test score less extreme
than the score of a gene whose posterior probability is less than delta
.
So, let's say, we have done an EBAM analysis with a t-test and we have ordered
the genes by their t-statistic. Let's further assume that Gene 1 to Gene 5 (i.e.
the five genes with the lowest t-statistics), Gene 7 and 8, Gene 3012 to 3020,
and Gene 3040 to 3051 are the only genes that show a posterior probability larger
than delta
. Then, Gene 1 to 5, and 3040 to 3051 are called differentially
expressed, but Gene 7 and 8, and 3012 to 3020 are not called differentially
expressed, since Gene 6 and Gene 3021 to 3039 show a posterior probability less
than delta
.
an object of class FindA0
The numbers of differentially expressed genes can differ between find.a0
and ebam
, even though the same value of the fudge factor is used, since
in find.a0
the observed and permuted test scores are monotonically
transformed such that the observed scores follow a standard normal distribution
(if the test statistic can take both positive and negative values) and
an F-distribution (if the test statistic can only take positive values) for each
possible choice of the fudge factor.
Holger Schwender, holger.schw@gmx.de
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment, JASA, 96, 1151-1160.
## Not run: # Load the data of Golub et al. (1999) contained in the package multtest. data(golub) # golub.cl contains the class labels. golub.cl # Obtain the number of differentially expressed genes and the FDR for the # default set of values for the fudge factor. find.out <- find.a0(golub, golub.cl, rand = 123) find.out # Obtain the number of differentially expressed genes and the FDR when using # the t-statistic assuming equal group variances find.out2 <- find.a0(golub, golub.cl, var.equal = TRUE, rand = 123) # Using the Output of the first analysis with find.a0, the number of # differentially expressed genes and the FDR for other values of # delta, e.g., 0.95, can be obtained by print(find.out, 0.95) # The logit-transformed posterior probabilities can be plotted by plot(find.out) # To avoid the logit-transformation, set logit = FALSE. plot(find.out, logit = FALSE) ## End(Not run)