readBeadSummaryData {beadarray} | R Documentation |
Function to read the output of Illumina's BeadStudio software into beadarray
readBeadSummaryData(dataFile, qcFile=NULL, sampleSheet=NULL, sep="\t", skip=8, ProbeID="ProbeID", columns = list(exprs = "AVG_Signal", se.exprs="BEAD_STDERR", NoBeads = "Avg_NBEADS", Detection="Detection Pval"), qc.sep="\t", qc.skip=8, controlID="ProbeID", qc.columns = list(exprs="AVG_Signal", se.exprs="BEAD_STDERR", NoBeads="Avg_NBEADS", Detection="Detection Pval"), annoPkg=NULL, dec=".", quote="")
dataFile |
character string specifying the name of the file containing the BeadStudio output for each probe on each array in an experiment (required). Ideally this should be the 'SampleProbeProfile' from BeadStudio. |
qcFile |
character string giving the name of the file containing the control probe intensities (optional). This file should be either the 'ControlProbeProfile' or 'ControlGeneProfile' from BeadStudio. |
sampleSheet |
character string used to specify the file containing sample infomation (optional) |
sep |
field separator character for the dataFile ("\t" for
tab delimited or "," for comma separated) |
skip |
number of header lines to skip at the top of dataFile .
Default value is 8. |
ProbeID |
character string of the column in dataFile that contains
identifiers that can be used to uniquely identify each probe |
columns |
list defining the column headings in dataFile which
correspond to the matrices stored in the assayData slot of the final ExpressionSetIllumina object |
qc.sep |
field separator character for qcFile |
qc.skip |
number of header lines to skip at the top of qcFile |
controlID |
character string specifying the column in qcFile that contains
the identifiers that uniquely identify each control probe |
qc.columns |
list defining the column headings in qcFile which
correspond to the matrices stored in the QCInfo slot of
the final ExpressionSetIllumina object |
annoPkg |
character string specifying the name of the annotation package (only available for certain expression arrays at present) |
dec |
the character used in the dataFile and qcFile for decimal points |
quote |
the set of quoting characters (disabled by default) |
This function can be used to read gene expression data exported
from versions 1,2 and 3 of the Illumina BeadStudio application.
The format of the BeadStudio output will depend on the version number.
For example, the file may be comma or tab separated of have header
information at the top of the file. The parameters sep
and skip
can be used to adapt the function as required (i.e. skip=7 is
appropriate for data from earlier version of BeadStudio, and skip=0 is
required if header information hasn't been exported.
The format of the BeadStudio file is assumed to have one row for each probe sequence in the experiment and a set number of columns for each array. The columns which are exported for each array are chosen by the user when running BeadStudio. At a minimum, columns for average intensity standard error, the number of beads and detection scores should be exported, along with a column which contains a unique identifier for each bead type (usually named "ProbeID").
It is assumed that the average bead intensities for each array appear in
columns with headings of the form 'AVG_Signal-ARRAY1',
'AVG_Signal-ARRAY2',...,'AVG_Signal-ARRAYN' for the N arrays found in the
file. All other column headings are matched in the same way using the character
strings specified in the columns
argument.
NOTE: With version 2 of BeadStudio it is possible to export annotation and sequence information along with the intensities. We _don't_ recommend exporting this information, as special characters found in the annotation columns can cause problems when reading in the data. This annotation information can be retrieved later on from other Bioconductor packages.
The default object created by readBeadSummaryData is an
ExpressionSetIllumina
object.
If the control intensities have been exported from BeadStudio
('ControlProbeProfile') this may be read into beadarray as well. The
qc.skip
, qc.sep
and qc.columns
parameters can be
used to adjust for the contents of the file. If the 'ControlGeneProfile'
is exported, you will need to set controlID="TargetID"
.
Sample sheet information can also be used. This is a file format used by Illumina to specify which sample has been hybridised to each array in the experiment.
Note that if the probe identifiers are non-unique, the duplicated
rows are removed. This may occur if the 'SampleGeneProfile' is
exported from BeadStudio and/or ProbeID="TargetID"
is specified
(the "ProbeID" column has a unique identifier in the 'SampleProbeProfile',
whereas the "TargetID" may not, as multiple beads can target the same
transcript).
An ExpressionSetIllumina
object.
Mark Dunning and Mike Smith
##code to read the example BeadStudio (version 2) output distributed with the package #dataFile = "SampleProbeProfile.txt" #sampleSheet = "SampleSheet.csv" #qcFile = "ControlGeneProfile.txt" #BSData =readBeadSummaryData(dataFile, qcFile=qcFile, sampleSheet=sampleSheet, controlID="TargetID")