fitGG {gaga} | R Documentation |
Fits GaGa or MiGaGa hierarchical models, either via a fully Bayesian approach or via maximum likelihood.
fitGG(x, groups, patterns, equalcv = TRUE, nclust = 1, method = "quickEM", B, priorpar, parini, trace = TRUE)
x |
ExpressionSet , exprSet , data frame or matrix
containing the gene expression measurements used to fit the model. |
groups |
If x is of type ExpressionSet or
exprSet , groups should be the name of the column
in pData(x) with the groups that one wishes to compare. If
x is a matrix or a data frame, groups should be a
vector indicating to which group each column in x
corresponds to. |
patterns |
Matrix indicating which groups are put together under
each pattern, i.e. the hypotheses to consider for each
gene. colnames(patterns) must match the group levels
specified in groups .
Defaults to two hypotheses: null hypothesis of all groups
being equal and full alternative of all groups being different. |
equalcv |
equalcv==TRUE fits model assuming constant CV across groups. equalcv==FALSE compares cv as well as mean expression levels between groups |
nclust |
Number of clusters in the MiGaGa model. nclust
corresponds to the GaGa model. |
method |
method=='MH' fits a fully Bayesian model via
Metropolis-Hastings posterior sampling. method=='Gibbs' does
the same using Gibbs sampling. method=='SA' uses Simulated
Annealing to find the posterior mode. method=='EM' finds
maximum-likelihood estimates via the expectation-maximization
algorithm, but this is currently only implemented for
nclust>1 . method=='quickEM' is a quicker
implementation that only performs 2 optimization steps (see details). |
B |
Number of iterations. For method=='MH' and method=='Gibbs' , B
is the number of MCMC iterations (defaults to 1000). For
method=='SA' , B is the number of iterations in the
Simulated Annealing scheme (defaults to 200). For
method=='EM' , B is the maximum number of iterations
(defaults to 20). |
priorpar |
List with prior parameter values. It must have
components a.alpha0,b.alpha0,a.nu,b.nu,a.balpha,b.balpha,a.nualpha,b.nualpha,p.probclus
and p.probpat . If missing they are set to non-informative
values that are usually reasonable for RMA and GCRMA normalized data. |
parini |
list with components a0 , nu ,
balpha , nualpha , probclus and probpat
indicating the starting values for the hyper-parameters. If not
specified, a method of moments estimate is used. |
trace |
For trace==TRUE the progress of the model fitting
routine is printed. |
An approximation is used to sample faster from the
posterior distribution of the gamma shape parameters and to compute
the normalization constants (needed to evaluate the likelihood). These
approximations are implemented in rcgamma
and mcgamma
.
The cooling scheme in method=='SA'
uses a temperature equal to
1/log(1+i)
, where i
is the iteration number.
The EM implementation in method=='quickEM'
is a quick EM
algorithm that usually delivers hyper-parameter estimates very similar
to those obtained via the slower method=='EM'
. Additionally,
the GaGa model inference has been seen to be robust to moderate
changes in the hyper-parameter estimates in most datasets.
An object of class gagafit
, with components
parest |
Hyper-parameter estimates. Only returned if method=='EBayes' , for method=='Bayes' one must call the function parest after fitGG |
mcmc |
Object of class mcmc with posterior draws for hyper-parameters. Only returned if method=='Bayes' . |
lhood |
For method=='Bayes' it is the log-likelihood evaluated at each MCMC iteration. For method=='EBayes' it is the log-likelihood evaluated at the maximum. |
nclust |
Same as input argument. |
patterns |
Same as input argument, converted to object of class gagahyp . |
David Rossell
Rossell D. GaGa: a simple and flexible hierarchical model for microarray data analysis. http://rosselldavid.googlepages.com.
parest
to estimate hyper-parameters and compute
posterior probabilities after a GaGa or MiGaGa
fit. findgenes
to find differentially expressed
genes. classpred
to predict the group that a new sample
belongs to.
library(gaga) set.seed(10) n <- 100; m <- c(6,6) a0 <- 25.5; nu <- 0.109 balpha <- 1.183; nualpha <- 1683 probpat <- c(.95,.05) xsim <- simGG(n,m,p.de=probpat[2],a0,nu,balpha,nualpha,equalcv=TRUE) x <- exprs(xsim) #Frequentist fit: EM algorithm to obtain MLE groups <- pData(xsim)$group[c(-6,-12)] patterns <- matrix(c(0,0,0,1),2,2) colnames(patterns) <- c('group 1','group 2') gg1 <- fitGG(x[,c(-6,-12)],groups,patterns=patterns,method='EM',trace=FALSE) gg1 <- parest(gg1,x=x[,c(-6,-12)],groups) gg1