scoring.matrices {Biostrings} | R Documentation |
Predefined substitution scoring matrices for nucleotide and amino acid alignments.
data(BLOSUM45) data(BLOSUM50) data(BLOSUM62) data(BLOSUM80) data(BLOSUM100) data(PAM30) data(PAM40) data(PAM70) data(PAM120) data(PAM250)
A square symetric matrix with integer coefficients. The row and column names are identical and unique: each name is a single letter representing a nucleotide or an amino acid.
Note that there can exist different versions of a given scoring matrix. For example, definition of widely used BLOSUM62 matrix varies depending on the source. Even a given source can provide different versions of it but the name is always BLOSUM62 and they provide no history or versioning mechanism! NCBI for example provides many matrices here ftp://ftp.ncbi.nih.gov/blast/matrices/ but their definitions don't match those of the matrices bundled with their standalone BLAST software available here ftp://ftp.ncbi.nih.gov/blast/
The BLOSUM45, BLOSUM62, BLOSUM80, PAM30 and PAM70 matrices were taken from NCBI standalone BLAST software.
The BLOSUM50, BLOSUM100, PAM40, PAM120 and PAM250 matrices were taken from ftp://ftp.ncbi.nih.gov/blast/matrices/
needwunsQS
,
BStringAlign-class,
DNAString-class,
AAString-class
## Align 2 amino acid sequences with the BLOSUM62 matrix aa1 <- AAString("HXBLVYMGCHFDCXVBEHIKQZ") aa2 <- AAString("QRNYMYCFQCISGNEYKQN") needwunsQS(aa1, aa2, "BLOSUM62", gappen=3) ## See how the gap penalty influences the alignment needwunsQS(aa1, aa2, "BLOSUM62", gappen=8) ## See how the scoring matrix influences the alignment needwunsQS(aa1, aa2, "BLOSUM50", gappen=3) ## Compare our BLOSUM62 with BLOSUM62 from ftp://ftp.ncbi.nih.gov/blast/matrices/ data(BLOSUM62) BLOSUM62["Q", "Z"] file <- "ftp://ftp.ncbi.nih.gov/blast/matrices/BLOSUM62" b62 <- as.matrix(read.table(file, check.names=FALSE)) b62["Q", "Z"]