readAligned {ShortRead} | R Documentation |
readAligned
reads all aligned read files in a directory
dirPath
whose file name matches pattern
,
returning a compact internal representation of the alignments,
sequences, and quality scores in the files. Methods read all files into a
single R object; a typical use is to restrict input to a single
aligned read file.
readAligned(dirPath, pattern=character(0), ...)
dirPath |
A character vector (or other object; see methods defined on this generic) giving the directory path (relative or absolute) of aligned read files to be input. |
pattern |
The (grep -style) pattern describing file
names to be read. The default (character(0) ) results in
(attempted) input of all files in the directory. |
... |
Additional arguments, used by methods. Most methods
implement filter=srFilter() , allowing objects of
SRFilter to selectively returns aligned reads. |
There is no standard aligned read file format; methods parse particular file types.
The readAligned,character-method
interprets file types based
on an additional type
argument. Supported types are:
type="SolexaExport"
This type parses .*_export.txt
files following the
documentation in the Solexa Genome Alignment software manual,
version 0.3.0. These files consist of the following columns;
consult Solexa documentation for precise descriptions. If parsed,
values can be retrieved from AlignedRead
as
follows:
alignData
alignData
alignData
alignData
alignData
sread
quality
chromosome
position
strand
alignQuality
alignData
Paired read columns are not interpreted. The resulting
AlignedRead
object does not contain a
meaningful id
; instead, use information from
alignData
to identify reads.
Different interfaces to reading alignment files are described in
SolexaPath
and SolexaSet
.
type="SolexaPrealign"
type="SolexaAlign"
type="SolexaRealign"
These types parse s_L_TTTT_prealign.txt
,
s_L_TTTT_align.txt
or s_L_TTTT_realign.txt
files
produced by default and eland analyses. From the Solexa
documentation, align
corresponds to unfiltered first-pass
alignements, prealign
adjusts alignments for error rates
(when available), realign
filters alignments to exclude
clusters failing to pass quality criteria.
Because base quality scores are not stored with alignments, the
object returned by readAligned
scores all base qualities as
-32
.
If parsed, values can be retrieved from
AlignedRead
as follows:
sread
alignQuality
alignData
position
strand
readXStringColumns
alignData
type="MAQMap", records=-1L
map
files produced by MAQ. See details in the next section. The
records
option determines how many lines are read;
-1L
(the default) means that all records are input.type="MAQMapShort", records=-1L
type="MAQMap"
but for map files made with Maq prior to version 0.7.0. (These files
use a different maximum read length [64 instead of 128], and are hence
incompatible with newer Maq map files.)type="MAQMapview"
Parse alignment files created by MAQ's ‘mapiew’ command. Interpretation of columns is based on the description in the MAQ manual, specifically
...each line consists of read name, chromosome, position, strand, insert size from the outer coordinates of a pair, paired flag, mapping quality, single-end mapping quality, alternative mapping quality, number of mismatches of the best hit, sum of qualities of mismatched bases of the best hit, number of 0-mismatch hits of the first 24bp, number of 1-mismatch hits of the first 24bp on the reference, length of the read, read sequence and its quality.
The read name, read sequence, and quality are read as
XStringSet
objects. Chromosome and strand are read as
factor
s. Position is numeric
, while mapping quality is
numeric
. These fields are mapped to their corresponding
representation in AlignedRead
objects.
Number of mismatches of the best hit, sum of qualities of mismatched
bases of the best hit, number of 0-mismatch hits of the first 24bp,
number of 1-mismatch hits of the first 24bp are represented in the
AlignedRead
object as components of alignData
.
Remaining fields are currently ignored.
A single R object (e.g., AlignedRead
) containing
alignments, sequences and qualities of all files in dirPath
matching pattern
. There is no guarantee of order in which files
are read.
Martin Morgan <mtmorgan@fhcrc.org>, Simon Anders <anders@ebi.ac.uk> (MAQ map)
A AlignedRead
object.
The MAQ reference manual, http://maq.sourceforge.net/maq-manpage.shtml#5, 3 May, 2008
sp <- SolexaPath(system.file("extdata", package="ShortRead")) ap <- analysisPath(sp) ## ELAND_EXTENDED readAligned(ap, "s_2_export.txt", "SolexaExport") ## PhageAlign readAligned(ap, "s_5_.*_realign.txt", "SolexaRealign") ## MAQ dirPath <- system.file('extdata', 'maq', package='ShortRead') list.files(dirPath) ## First line readLines(list.files(dirPath, full.names=TRUE)[[1]], 1) countLines(dirPath) ## two files collapse into one readAligned(dirPath, type="MAQMapview") ## select only chr1-5.fa, '+' strand filt <- compose(chromosomeFilter("chr[1-5].fa"), strandFilter("+")) readAligned(sp, "s_2_export.txt", filter=filt)