getSeq {BSgenome} | R Documentation |
Extract the sequence delimitted by a start and an end positions relative to the full sequence of a given genome.
getSeq(bsgenome, seqname, start=NA, end=NA, as.BStringViews=FALSE)
bsgenome |
A BSgenome object as one of those found in the BSgenome
data packages. See the available.genomes function
for more details.
|
seqname |
[TODO: Document me] |
start |
[TODO: Document me] |
end |
[TODO: Document me] |
as.BStringViews |
[TODO: Document me] |
If length(seqname) == 1
, a character vector (as.BStringViews=FALSE
)
or a BStringViews
object (as.BStringViews=TRUE
).
If length(seqname) != 1
, a character vector (as.BStringViews
is ignored).
getSeq
is very efficient when used with as.BStringViews=TRUE
(this only works when seqname
is of length 1), because in this case
the sequence data are not copied.
Otherwise, the data are copied. Be aware that this can be very inefficient
if the returned vector contains very long strings (> 1 million letters)
or is itself very long (> 10000 strings).
H. Pages
available.genomes
,
BSgenome-class,
seqnames
,
substr
,
subBString
,
BStringViews
,
DNAString
# Load the Caenorhabditis elegans genome (UCSC Release ce2): library(BSgenome.Celegans.UCSC.ce2) # Look at the index of sequences: Celegans # Get the first 20 bases of each chromosome: getSeq(Celegans, seqnames(Celegans), 1, 20) # Some sequences starting at pos 1 in chromosome V: getSeq(Celegans, "chrV", 1, 1:4) # Omitting the 'start' (or the 'end') argument is equivalent # to starting at the first nucleotide (or ending at the last # nucleotide): getSeq(Celegans, "chrV", , 1:4) getSeq(Celegans, "chrV", 20922200) # Never try this: #getSeq(Celegans, "chrV") # or this (even worse): #getSeq(Celegans, seqnames(Celegans)) # unless you want to see millions of screens filled with A, C, G and T # and kill your system. # Get the 10-base sequences starting at positions 150, 250, ..., 1250 # in chromosome V: starts <- seq(150, by=100, length.out=12) getSeq(Celegans, "chrV", starts, starts + 9) # The same returned as a BStringViews object (_much_ faster, no data is copied): getSeq(Celegans, "chrV", starts, starts + 9, as.BStringViews=TRUE) # The display of a BStringViews object stays compact: starts <- seq(-10, by=1000, length.out=50) getSeq(Celegans, "chrV", starts, starts + 1002, as.BStringViews=TRUE) # Note that those views have overlaps of 3 nucleotides. # Get the whole chromosome V sequence as a DNAString object: getSeq(Celegans, "chrV", as.BStringViews=TRUE)[[1]]