readFASTA {Biostrings}R Documentation

Functions to read/write FASTA formatted files

Description

FASTA is a simple file format for biological sequence data. A file may contain one or more sequences, for each sequence there is a description line which begins with a >.

Usage

  readFASTA(file, checkComments=TRUE, strip.desc=FALSE)
  writeFASTA(x, file="", width=80)

Arguments

file Either a character string naming a file or a connection open for reading or writing. If "" (the default for writeFASTA), write.BStringViews writes to the standard output connection (the console) unless redirected by sink.
checkComments Whether or not comments, lines beginning with a semi-colon should be found and removed.
strip.desc Whether or not the ">" marking the beginning of the description lines should be removed or not.
x A list as one returned by readFASTA.
width The maximum number of letters per line of sequence.

Details

FASTA is a widely used format in biology. It is a relatively simple markup. I am not aware of a standard. It might be nice to check to see if the data that were parsed are sequences of some appropriate type, but without a standard that does not seem possible.

There are many other packages that provide similar, but different capabilities. The one in the package seqinr seems most similar but they separate the biological sequence into single character strings, which is too inefficient for large problems.

Value

A list with one element for each sequence in the file. The elements are in two parts, one the description and the second a character string of the biological sequence.

Author(s)

R. Gentleman

See Also

read.BStringViews, write.BStringViews, BStringViews-class, read.table, scan, write.table

Examples

f1 <- system.file("Exfiles/someORF.fsa", package="Biostrings")
ff <- readFASTA(f1, strip.desc=TRUE)
desc <- sapply(ff, function(x) x$desc)
## Keep the "reverse complement" sequences only
ff2 <- ff[grep("reverse complement", desc, fixed=TRUE)]
writeFASTA(ff2, file.path(tempdir(), "someORF2.fsa"))

[Package Biostrings version 2.6.6 Index]