yeastAnn {AnnBuilder} | R Documentation |
Given a GEO accession number for a yease data set and the extensions for annotation data files names that are available from Yeast Genom web site, the functions generates a data package with containing annoatation data for yeast genes in the GEO data set.
yeastAnn(base = "", yGenoUrl, yGenoNames = c("literature_curation/gene_literature.tab", "chromosomal_feature/SGD_features.tab", "literature_curation/gene_association.sgd.gz"), toKeep = list(c(6, 1), c(1, 5, 9, 10, 12, 16, 6), c(2, 5, 7)), colNames = list(c("sgdid", "pmid"), c("sgdid", "genename", "chr", "chrloc", "chrori", "description", "alias"), c("sgdid", "go")), seps = c("\t", "\t", "\t"), by = "sgdid") getProbe2SGD(probe2ORF = "", yGenoUrl, fileName = "literature_curation/orf_geneontology.tab", toKeep = c(1, 7), colNames = c("orf", "sgdid"), sep = "\t", by = "orf") procYeastGeno(baseURL, fileName, toKeep, colNames, seps = "\t") getGEOYeast(GEOAccNum, GEOUrl, geoCols = c(1, 8), yGenoUrl) formatGO(gos, evis) formatChrLoc(chr, chrloc, chrori) getYGExons(srcUrl, yGenoName = "chromosomal_feature/intron_exon.tab", sep = "\t")
base |
base a file name for a matrix with two columns.
The first column is probe ids and the second one are the mappings to
SGD ids used by all the Yeast Genome data files. If base = "",
the whole genome will be mapped based on a data file that contains
mappings between all the ORFs and SGD ids |
GEOAccNum |
GEOAccNum a character string for the accession
number given by GEO for a yeast data set |
GEOUrl |
GEOUrl a character string for the url that
contains a common CGI for all the GEO data. Currently it is
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi? |
geoCols |
geoCols a vector of integers for the coloumn
numbers of the source file from GEO that maps yeast probe ids to ORF
ids |
yGenoUrl |
yGenoUrl a character string for the url that is
a directory in Yeast Genom web site that contains directories for
yeast annotation data. Currently it is
ftp://genome-ftp.stanford.edu/pub/yeast/data_download/ |
baseURL |
see yGenoUrl |
yGenoNames |
yGenoNames a vector of character strings for
the names of yeast annotation data. Each of the strings can be
appended to yGenoUrl to make a complete url for a data file |
fileName |
a character string for the extension part of the source data file that can be used to target genes to SGD ids |
toKeep |
toKeep a list of vector of integers with numbers
corresponding to column numbers of yeast genom data files that will
kept when data files are processed. The length of toKeep must be the
same as yGenoName (a vector for each file) |
colNames |
colNames a list of vectors of character strings
for the names to be given to the columns to keep when processing the
data. Again, the length of colNames must be the same as yGenoNames |
seps |
seps a vector of characters for the separators used
by the data files included in yGenoNames |
sep |
singular version of seps |
by |
by a character string for the column that is common
in all data files to be processed. The column will be used to merge
separate data files |
probe2ORF |
probe2ORF a matrix with mappings of yease
target genes to ORF ids that in turn can be mapped to SGD ids |
gos |
gos a vector of character strings for GO ids
retrieved from Yeast Genome Project |
evis |
evis a vector of character string for the evidence
code associated with go ids |
chr |
chr a vector of character strings for chromosome
numbers |
chrloc |
chrloc a vector of integers for chromosomal
locations |
chrori |
chrori a vector of characters that can either be
w or c that are used for strand of yeast chromosomes |
srcUrl |
srcUrl a character string for the url where
source yeast genome data are stroed |
yGenoName |
yGenoName a character string for the yeast
genome file name to be processed |
To merge files, the system has to map the target genes in the base file to SGD ids and then use SGD ids to map traget genes to annotation data from different sources.
formatGO
adds leading 0s to goids when needed and then
append the evidence code to the end of a goid following a "@".
formatChrLoc
assigns a + or - sing to chrloc
depending on whether the corresponding chrori
is w or c and
then append chr
to the end of chrloc
following a "@".
getGEOYeast
gets yeast data from GEO for the columns
specified.
yeastAnn
returns a matrix with traget genes annotated by
data from selected data columns in different data sources.
getProbe2SGD
returns a matrix with mappings between
target genes and SGD ids.
procYeastGeno
returns a data matrix.
formatGO
returns a vector of character strings.
formatChrLoc
returns a vector of character strings.
getGEOYeast
returns a matrix with the number of columns
specified.
Jianhua Zhang
## Not run: yeastData <- yeastAnn(GEOAccNum = "GPL90") ## End(Not run)