fitGG {gaga}R Documentation

Fit GaGa hierarchical model

Description

Fits GaGa or MiGaGa hierarchical models, either via a fully Bayesian approach or via maximum likelihood.

Usage

fitGG(x, groups, patterns, equalcv = TRUE, nclust = 1, method = "quickEM", B, priorpar, parini, trace = TRUE)

Arguments

x ExpressionSet, exprSet, data frame or matrix containing the gene expression measurements used to fit the model.
groups If x is of type ExpressionSet or exprSet, groups should be the name of the column in pData(x) with the groups that one wishes to compare. If x is a matrix or a data frame, groups should be a vector indicating to which group each column in x corresponds to.
patterns Matrix indicating which groups are put together under each pattern, i.e. the hypotheses to consider for each gene. colnames(patterns) must match the group levels specified in groups. Defaults to two hypotheses: null hypothesis of all groups being equal and full alternative of all groups being different.
equalcv equalcv==TRUE fits model assuming constant CV across groups. equalcv==FALSE compares cv as well as mean expression levels between groups
nclust Number of clusters in the MiGaGa model. nclust corresponds to the GaGa model.
method method=='MH' fits a fully Bayesian model via Metropolis-Hastings posterior sampling. method=='Gibbs' does the same using Gibbs sampling. method=='SA' uses Simulated Annealing to find the posterior mode. method=='EM' finds maximum-likelihood estimates via the expectation-maximization algorithm, but this is currently only implemented for nclust>1. method=='quickEM' is a quicker implementation that only performs 2 optimization steps (see details).
B Number of iterations. For method=='MH' and method=='Gibbs', B is the number of MCMC iterations (defaults to 1000). For method=='SA', B is the number of iterations in the Simulated Annealing scheme (defaults to 200). For method=='EM', B is the maximum number of iterations (defaults to 20).
priorpar List with prior parameter values. It must have components a.alpha0,b.alpha0,a.nu,b.nu,a.balpha,b.balpha,a.nualpha,b.nualpha,p.probclus and p.probpat. If missing they are set to non-informative values that are usually reasonable for RMA and GCRMA normalized data.
parini list with components a0, nu, balpha, nualpha, probclus and probpat indicating the starting values for the hyper-parameters. If not specified, a method of moments estimate is used.
trace For trace==TRUE the progress of the model fitting routine is printed.

Details

An approximation is used to sample faster from the posterior distribution of the gamma shape parameters and to compute the normalization constants (needed to evaluate the likelihood). These approximations are implemented in rcgamma and mcgamma.

The cooling scheme in method=='SA' uses a temperature equal to 1/log(1+i), where i is the iteration number.

The EM implementation in method=='quickEM' is a quick EM algorithm that usually delivers hyper-parameter estimates very similar to those obtained via the slower method=='EM'. Additionally, the GaGa model inference has been seen to be robust to moderate changes in the hyper-parameter estimates in most datasets.

Value

An object of class gagafit, with components

parest Hyper-parameter estimates. Only returned if method=='EBayes', for method=='Bayes' one must call the function parest after fitGG
mcmc Object of class mcmc with posterior draws for hyper-parameters. Only returned if method=='Bayes'.
lhood For method=='Bayes' it is the log-likelihood evaluated at each MCMC iteration. For method=='EBayes' it is the log-likelihood evaluated at the maximum.
nclust Same as input argument.
patterns Same as input argument, converted to object of class gagahyp.

Author(s)

David Rossell

References

Rossell D. GaGa: a simple and flexible hierarchical model for microarray data analysis. http://rosselldavid.googlepages.com.

See Also

parest to estimate hyper-parameters and compute posterior probabilities after a GaGa or MiGaGa fit. findgenes to find differentially expressed genes. classpred to predict the group that a new sample belongs to.

Examples

library(gaga)
set.seed(10)
n <- 100; m <- c(6,6)
a0 <- 25.5; nu <- 0.109
balpha <- 1.183; nualpha <- 1683
probpat <- c(.95,.05)
xsim <- simGG(n,m,p.de=probpat[2],a0,nu,balpha,nualpha,equalcv=TRUE)
x <- exprs(xsim)

#Frequentist fit: EM algorithm to obtain MLE
groups <- pData(xsim)$group[c(-6,-12)]
patterns <- matrix(c(0,0,0,1),2,2)
colnames(patterns) <- c('group 1','group 2')
gg1 <- fitGG(x[,c(-6,-12)],groups,patterns=patterns,method='EM',trace=FALSE)  
gg1 <- parest(gg1,x=x[,c(-6,-12)],groups)
gg1


[Package gaga version 1.2.0 Index]