matchprobes {matchprobes} | R Documentation |
The query
sequence, a character string (probably representing
a transcript of interest), is scanned for the presence of exact
matches to the sequences in the character vector records
.
The indices of the set of matches are returned.
matchprobes(query, records, probepos=FALSE)
query |
A character vector. For example, each element may represent a gene (transcript) of interest. See Details. |
records |
A character vector. For example, each element may represent the probes on a DNA array. |
probepos |
A logical value. If TRUE, return also the start positions of the matches in the query sequence. |
toupper
is applied to the arguments query
and
records
before matching. The intention of this is to make
the matching case-insensitive.
The matching is done using the C library function strstr
. It
might be nice to explore other possibilities.
A list.
Its first element is a list of the same length as the input vector.
Each element of the list is a numeric vector containing the indices of
the probes that have a perfect match in the query
sequence.
If probepos
is TRUE,
the returned list has a second element: it is of the same shape
as described above, and gives the respective positions of the
matches.
R. Gentleman, Laurent Gautier, Wolfgang Huber
## The main intention for this function is together with the probe ## tables from the "probe" data packages, e.g.: ## > library(hgu95av2probe) ## > data(probe) ## > seq <- probe$sequence ## ## Since we do not want to be dependent on the presence of this ## data package, for the sake of example we simply simulate some ## probe sequences: bases <- c("A", "C", "G", "T") trsk <- sapply(1:10, function(x) paste(bases[ceiling(4*runif(1280))], collapse="")) seq <- sample(trsk, 20, replace=TRUE) starts <- ceiling((nchar(seq)-26)*runif(length(seq))) seq <- substr(seq, starts, starts+24) seq <- c(seq, complementSeq(seq, start=13, stop=13)) matchprobes(trsk, seq, probepos=TRUE)