athPkgBuilder {AnnBuilder} | R Documentation |
These functions are implemented specifically for building annotation data pckages for arabidopsis using the Arabidopsis information source (TAIR).
athPkgBuilder( baseName = NULL, pkgName, pkgPath, fileExt = list( base = "Microarrays/Affymetrix/affy_ATH1_array_elements-2006-07-14.txt", estAssign = "Genes/est_mapping/est.Assignment.Locus", seqGenes = "Genes/TAIR_sequenced_genes", go = "Ontologies/Gene_Ontology/ATH_GO_GOSLIM.20050827.txt", aliases = "Genes/gene_aliases.20041105", aracyc = "Pathways/aracyc_dump_20050412", kegg = "/ath/ath_gene_map.tab", pmid = "User_Requests/LocusPublished.08012006.txt"), ncols = list( base = 9, estAssign = 7, seqGenes = 4, go = 12, aliases = 4, aracyc = 4, kegg = 2, pmid = 4), cols2Keep = list( base = c(1, 5), estAssign = c(3, 6, 7), seqGenes = c(1, 3, 4), go = c(1, 5, 9), aliases = c(1, 2), aracyc = c(1, 3, 4), kegg = c(1, 2), pmid = c(1, 4)), colNames = list( base = c("PROBE", "ACCNUM"), estAssign = c("CHRLOC", "ORI", "ACCNUM"), seqGenes = c("ACCNUM", "CHR", "GENENAME"), go = c("ACCNUM", "GO", "EVID"), aliases = c("ACCNUM", "SYMBOL"), aracyc = c("ARACYC", "ENZYME", "ACCNUM"), kegg = c("ACCNUM", "PATH"), pmid = c("ACCNUM", "PMID")), indexby = "PROBE", version, author, lazyLoad = TRUE) getOneMap(map, keyCol) procPMIDData(pmid) getSrcObjs4Ath() readAthData(baseUrl, ext, col2Keep, colNames, ncols) mergeDupMatByFirstCol(dupMat, sep = ";") getFileExt(chipName = "ATH1", verbose = FALSE)
baseName |
baseName a character string for the name of the
base file to be used to build an annotation data package. The base
file is assumed to have two columns with the first one being probe
ids and second one being the corresponding TAIR locus ids. If no input
is given, the file pointed by slot base in fileExt is
used |
pkgName |
pkgName a character string for the name of the
data package to be built |
pkgPath |
pkgPath a character string for the path to a
directory where the data package to be built will be stored |
fileExt |
fileExt a list of character strings for the
extension to be appended to a base url to form a complete url for a
desired source data file stored at TAIR's ftp site. Some of the
names given as default will change with time and need to be updated.
The input value of fileExt can be generated by getFileExt |
ncols |
ncols an integer indicating the total number of
columns of a given source data file |
cols2Keep |
cols2Keep a vector of integers indicating
which of the columns of a given source data file will be retained
when the source file is read |
colNames |
colNames a vector of character strings for the
names of the columns of the source file to be retained |
indexby |
indexby whether use probeset ID or TAIR locus ID
to index most annotations, either PROBE (default) or
ACCNUM |
version |
version a character string for the version
number of the data package to be built |
author |
author a list of character stirngs with an author
and maintainer element for the name and email address of the author |
baseUrl |
baseUrl a character string for the base url to
TAIRs ftp site, The default is
ftp://tairpub:tairpub@ftp.arabidopsis.org/home/tair/ |
map |
map a matrix containing mappings between probe ids
and annotation data |
keyCol |
keyCol an integer or character string for the
name of the column in a matrix that contains the keys based on which
data in the other columns will be merged for duplicated keys |
pmid |
pmid a matrix containing mappings between probe ids
and PubMed ids regarding genes represented by the probe ids |
ext |
ext a single string version of fileExt |
dupMat |
dupMat a matrix with duplicating values for
entries in a column defined as keys |
sep |
sep a character string for separator to be used when
values in a matrix are merged based on keys contained in another
columns |
col2Keep |
col2Keep a vector of integers indicating which
of the column of a data file will be kept when a file is read |
lazyLoad |
lazyLoad a boolean indicating whether a lazy
load database will be created |
chipName |
chipName affymetrix chip name, either ATH
or AG |
verbose |
verbose logical, whether give verbose output for
getFileExt |
The annotation data will be extracted from various sources that may
change in both names and contents. The default values provided were
correct at the time of implementation but may need updating when the
function is actually used. getFileExt
helps to generate the
up-to-date value for parameter fileExt
in athPkgBuilder
The main function athPkgBuilder returns invisible()
Jianhua Zhang
# No example is provided due to the length of time required to build a package