SIM-package {SIM}R Documentation

Statistical Integration of Microarrays

Description

SIM is a statistical model to identify copy number changes that affect the expression of genes within the same chromosomal region. Copy number is considered as the dependent variable and expression as the independent variable. Copy number alterations may span many expression probes and affect them in a possibly subtle but consistent way. Therefore, we test whether copy number is associated with a set of expression levels within a chromosome arm (or mimimal common region) in a random-effect model. Association scores for individual expression levels (z-scores) are also calculated. For more information on the random-effect model, see ?globaltest.

Each sample should be profiled both on a copy number and on an expression array. The array platforms used for DNA and RNA analysis may be different as long as the probes have mapped to the genome. RESOURCERER can be used to search chromosome and basepair location for expression microarray probes (http://compbio.dfci.harvard.edu/tgi/cgi-bin/magic/r1.pl). See RESOURCERER.annotation.to.ID on how to insert this information as annotation columns. Alternatively, the chromosome, basepair locations and gene symbol can be extracted from AnnotationData packages available in Bioconductor or generated using the AnnBuilder package.

When copy number data is run as dependent variable, we use method.adjust="BY" for multiple testing correction. This method accounts for dependence between measurements and is more conservative than "BH". For details on the multiple testing correction methods see ?p.adjust. We have experienced that a rather low stringency cut-off on the BY-values of 20% allows the detection of associations for data with a low number of samples or a low frequency of abberations. False positives are rarely observed.

Make sure that the array probes are mapped to the same builds of the genome, and that the chrom.table used by the integrated.analysis is from the same build as well. See sim.update.chrom.table.

Details

Package: SIM
Type: Package
Version: 1.9.0
Date: 2008-02-06
License: Open

Author(s)

Marten Boetzer, Melle Sieswerda, Renee X. de Menezes R.X.Menezes@lumc.nl

References

R.X. de Menezes, M. Boetzer, M. Sieswerda, G.J.B. van Ommen, J.M. Boer Integrated Statistical analysis to identify associations between DNA copy number and gene expression in microarray data. Submitted.

See Also

assemble.data, integrated.analysis, sim.plot.zscore.heatmap, sim.plot.pvals.on.region, sim.plot.pvals.on.genome, tabulate.pvals, tabulate.top.dep.features, tabulate.top.indep.features, impute.nas.by.surrounding, sim.update.chrom.table

Examples

#load the datasets and the samples to run the integrated analysis
data(expr.data)
data(acgh.data)
data(samples) 
         
#assemble the data
assemble.data(dep.data = acgh.data, indep.data = expr.data,ann.dep = colnames(acgh.data)[1:4], ann.indep = colnames(expr.data)[1:4], dep.id="ID", dep.chr = "CHROMOSOME",dep.pos = "STARTPOS",dep.symb="Symbol",  indep.id="ID",indep.chr = "CHROMOSOME", indep.pos = "STARTPOS", indep.symb="Symbol", overwrite = TRUE,run.name = "chr8")

#run the integrated analysis
integrated.analysis(samples = samples, input.regions = 8, adjust=FALSE, zscores=TRUE, method = "auto", run.name = "chr8")

# use functions to plot the results of the integrated analysis

#plot the p-values along the genome
sim.plot.pvals.on.genome(input.regions = 8,adjust.method = "BY",pdf = FALSE, run.name = "chr8")

#plot the p-values along the regions
sim.plot.pvals.on.region(input.regions = 8, adjust.method="BY", run.name = "chr8")

#plot the z-scores in an association heatmap
sim.plot.zscore.heatmap(input.regions = 8, significance=0.2, z.threshold=3, show.names.dep=TRUE,show.names.indep=TRUE, adjust.method = c("BY"), scale="auto", plot.method = "smooth", pdf = FALSE, run.name = "chr8")

#tabulate the p-values per region (prints to screen)
tabulate.pvals(input.regions = 8,adjust.method="BY", bins=c(0.001,0.005,0.01,0.025,0.05,0.075,0.10,0.20,1.0), significance.idx=8, order.by="%", decreasing=TRUE, run.name = "chr8") 

#get the top dependent features sorted by p-value
tabulate.top.dep.features(input.regions = 8, adjust.method="BY",run.name = "chr8")

#get the top independent features sorted by mean z-score
tabulate.top.indep.features(input.regions = 8,adjust.method="BY", significance=0.2, sort.order='positive', run.name = "chr8")

[Package SIM version 1.10.0 Index]