ggm.test.edges {GeneTS} | R Documentation |
ggm.test.edges
assigns statistical significance to the edges in a GGM network by computing
p-values, q-values and posterior probabilities for each potential edge.
ggm.test.edges(r.mat, MAXKAPPA=5000, kappa=NULL, eta0=NULL, fA.type=c("nonparametric", "uniform"), df=7, plot.locfdr=1)
r.mat |
matrix of partial correlations |
kappa |
the degree of freedom of the null distribution (will be estimated if left unspecified) |
eta0 |
the proportion of true null values (will be estimated if left unspecified) |
fA.type |
assumed type of alternative distribution - see also cor.fit.mixture |
MAXKAPPA |
upper bound for the estimated kappa - see cor.fit.mixture (default: MAXKAPPA=5000) |
df |
degrees of freedom for the spline fitting the density (only if fA.type="nonparametric") |
plot.locfdr |
controls plot option in locfdr |
A mixture model is fitted to the partial correlations using cor.fit.mixture
(this estimate can be overridden if values for both kappa
and eta0
are specified).
Subsequently, two-sided p-values to test non-zero correlation are computed for each edge using
cor0.test
. In addition, corresponding posterior probabilities are
computed (also using cor.fit.mixture
. Finally, to simplify multiple testing q-values
are computed via fdr.control
whith the specified value of eta0
taken
into account.
Theoretical details are explained in Schaefer and Strimmer (2005), along with a simulation study and an application to gene expression data.
A sorted data frame with the following columns:
pcor |
partial correlation (from r.mat) |
node1 |
first node connected to edge |
node2 |
second node connected to edge |
pval |
p-value |
qval |
q-value |
prob |
probability that edge is nonzero |
Each row in the data frame corresponds to one edge, and the rows are sorted
according the absolute strength of the correlation (from strongest to weakest)
Juliane Schaefer (http://www.statistik.lmu.de/~schaefer/) and Korbinian Strimmer (http://www.statistik.lmu.de/~strimmer/).
Schaefer, J., and Strimmer, K. (2005). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21:754-764.
cor.fit.mixture
,
cor0.test
,
fdr.control
,
ggm.estimate.pcor
.
# load GeneTS library library("GeneTS") # generate random network with 20 nodes and 5 percent edges true.pcor <- ggm.simulate.pcor(20, 0.05) # simulate data set of length 100 sim.dat <- ggm.simulate.data(100, true.pcor) # estimate partial correlation matrix (simple estimator) inferred.pcor <- ggm.estimate.pcor(sim.dat) # p-values, q-values and posterior probabilities for each edge # # try both options for fA! #test.results <- ggm.test.edges(inferred.pcor, fA.type="nonparametric") test.results <- ggm.test.edges(inferred.pcor, fA.type="uniform") # show best 20 edges (strongest correlation) test.results[1:20,] # how many are significant based on FDR cutoff Q=0.05 ? num.significant.1 <- sum(test.results$qval <= 0.05) test.results[1:num.significant.1,] # how many are significant based on "local fdr" cutoff (prob > 0.95) ? num.significant.2 <- sum(test.results$prob > 0.95) test.results[1:num.significant.2,] # parameters of the mixture distribution used to compute p-values etc. c <- cor.fit.mixture(sm2vec(inferred.pcor), fA.type="uniform") c$eta0 c$kappa