getSeq {BSgenome}R Documentation

getSeq

Description

Extract the sequence delimitted by a start and an end positions relative to the full sequence of a given genome.

Usage

  getSeq(bsgenome, seqname, start=NA, end=NA, as.BStringViews=FALSE)

Arguments

bsgenome A BSgenome object as one of those found in the BSgenome data packages. See the available.genomes function for more details.
seqname [TODO: Document me]
start [TODO: Document me]
end [TODO: Document me]
as.BStringViews [TODO: Document me]

Details

Value

If length(seqname) == 1, a character vector (as.BStringViews=FALSE) or a BStringViews object (as.BStringViews=TRUE). If length(seqname) != 1, a character vector (as.BStringViews is ignored).

Note

getSeq is very efficient when used with as.BStringViews=TRUE (this only works when seqname is of length 1), because in this case the sequence data are not copied. Otherwise, the data are copied. Be aware that this can be very inefficient if the returned vector contains very long strings (> 1 million letters) or is itself very long (> 10000 strings).

Author(s)

H. Pages

See Also

available.genomes, BSgenome-class, seqnames, substr, subBString, BStringViews, DNAString

Examples

  # Load the Caenorhabditis elegans genome (UCSC Release ce2):
  library(BSgenome.Celegans.UCSC.ce2)

  # Look at the index of sequences:
  Celegans

  # Get the first 20 bases of each chromosome:
  getSeq(Celegans, seqnames(Celegans), 1, 20)

  # Some sequences starting at pos 1 in chromosome V:
  getSeq(Celegans, "chrV", 1, 1:4)

  # Omitting the 'start' (or the 'end') argument is equivalent
  # to starting at the first nucleotide (or ending at the last
  # nucleotide):
  getSeq(Celegans, "chrV", , 1:4)
  getSeq(Celegans, "chrV", 20922200)

  # Never try this:
  #getSeq(Celegans, "chrV")
  # or this (even worse):
  #getSeq(Celegans, seqnames(Celegans))
  # unless you want to see millions of screens filled with A, C, G and T
  # and kill your system.

  # Get the 10-base sequences starting at positions 150, 250, ..., 1250
  # in chromosome V:
  starts <- seq(150, by=100, length.out=12)
  getSeq(Celegans, "chrV", starts, starts + 9)

  # The same returned as a BStringViews object (_much_ faster, no data is copied):
  getSeq(Celegans, "chrV", starts, starts + 9, as.BStringViews=TRUE)

  # The display of a BStringViews object stays compact:
  starts <- seq(-10, by=1000, length.out=50)
  getSeq(Celegans, "chrV", starts, starts + 1002, as.BStringViews=TRUE)
  # Note that those views have overlaps of 3 nucleotides.

  # Get the whole chromosome V sequence as a DNAString object:
  getSeq(Celegans, "chrV", as.BStringViews=TRUE)[[1]]

[Package BSgenome version 1.6.2 Index]