extract.lib {sagenhaft}R Documentation

Functions for SAGE library extraction

Description

Functions to extract the tags in a library from sequences or base-caller output.

Usage

extract.lib.from.zip(zipfile, libname=sub(".zip","",basename(zipfile)),
                     ...)
extract.lib.from.directory(dirname, libname=basename(dirname),
                           pattern, ...)
extract.library.tags(filelist, base.caller.format="phd",
                     remove.duplicate.ditags=TRUE, 
                     remove.N=FALSE, remove.low.quality=10,
                     taglength=10, min.ditag.length=(2*taglength-2),
                     max.ditag.length=(2*taglength+4),
                     cut.site="catg", default.quality=NA, verbose=TRUE,
                     ...) 
reestimate.lib.from.tagcounts(tagcounts, libname, default.quality=20, ...) 
compute.unique.tags(lib)
combine.libs(..., artifacts=c("Linker", "Ribosomal", "Mitochondrial"))
remove.sage.artifacts(lib,
                      artifacts=c("Linker","Ribosomal","Mitochondrial"),
                      ...)
read.phd.file(file)
read.seq.qual.filepair(file, default.quality=NA)
extract.ditags(sequence, taglength=10, filename=NA,
               min.ditag.length=(2*taglength-2),
               max.ditag.length=(2*taglength+4), cut.site="catg")

Arguments

zipfile,dirname Name of a ZIP file or a directory that contains base-caller output files
libname libname a character string to be assigned as library name
pattern Regular expression to specify pattern for the files that will be read
filelist List of files to be read
base.caller.format base.caller.format can be "phd" or "seq" or a character vector of the length of the filelist
remove.duplicate.ditags Remove duplicate ditags. TRUE or FALSE
remove.N Remove all tags that contain N. TRUE or FALSE
remove.low.quality Remove all tags with an average quality score of less than remove.low.quality. Skipped if < 0
taglength Length of tags. Usually 10 or 17
min.ditag.length,max.ditag.length Minimum and maximum length for ditags
cut.site Restriction enzyme cut site. Usually CATG
verbose Display information during process
lib Library object
file,filename Character string indicating file name
default.quality Quality value to use on sequences, if quality files are missing
sequence Construct containing sequence and quality values returned by read.phd.file or read.seq.qual.filepair
artifacts Types of artificially generated tags to remove.
... Arguments passed on to extraction functions.
tagcounts Tagcounts from library. Integer Vecotor with Tag sequences as names.

Details

The functions extract.lib.from.zip or extract.lib.from.directory should be used to extract the SAGE TAGS from the sequences of a library, the sequences need to be provided by the output files from the base caller software either in a ZIP archive or in a directory. These are usually the only functions that should directly be called by the user. The other functions are called by these and should only be used directly by experienced users to get more direct control over the process. Most arguments are passed on and can be specified in the high level functions. Zipfilenames must be specified using relative pathnames!

Value

lib returns an SAGE library object.

Author(s)

Tim Beissbarth

References

http://tagcalling.mbgproject.org

See Also

sage.library, error.correction

Examples

#library(sagenhaft)
#file.copy(system.file("data/E15postHFI.zip",package="sagenhaft"),
#          "E15postHFI.zip")
#E15post<-extract.lib.from.zip("E15postHFI.zip", taglength=10,
#                              min.ditag.length=20, max.ditag.length=24)
#E15post

[Package sagenhaft version 1.12.0 Index]