read.snps.long.old {snpMatrix} | R Documentation |
This function reads SNP genotype data and creates an object of class
"snp.matrix"
or "X.snp.matrix"
.
Input data are assumed to be arranged as one line per
SNP-call (without any headers). This function can read gzipped files.
read.snps.long.old(file, chip.id, snp.id, codes, female, conf = 1, threshold = 0.9, drop=FALSE, sorted=FALSE, progress=interactive())
file |
Name of file containing the input data. Input files
which have been compressed by the gzip utility are recognized |
chip.id |
Array of type "character" containing (unique)
identifiers for the chips, samples, or subjects for which calls are
to be read. Other samples in the input data will be ignored |
snp.id |
Array of type "character" containing (unique)
identifiers of the SNPs for which data will be read. Again, further
SNPs in the input data will be ignored |
codes |
For autosomal SNPs, an array of length 3 giving the codes
for the three genotypes, in the order homozygous(AA), heterozygous(AB),
homozygous(BB). For X SNPs, an additional two codes for the male
genotypes (AY and BY) must be supplied. All other codes will be treated
as "no call". The default codes are "0" , "1" ,
"2" [,"0" , "2" ] |
female |
If the data to be read refer to SNPs on the X chromosome, this
argument must be supplied and should indicate whether each row of
data refers to a female (TRUE ) or to a male
(FALSE ). The output object will then be of class
"X.snp.matrix" . |
conf |
Confidence score. See details |
drop |
If TRUE , any rows or columns without genotype calls
will be dropped from the output matrix. Otherwise the full matrix,
with rows and columns
defined by the chip.id and snp.id arguments, will
be returned |
threshold |
Acceptance threshold for confidence score |
sorted |
Is input file already sorted into the correct order (see details)? |
progress |
If TRUE , progress will be reported to the
standard output stream |
Data are assumed to be input with one line per call, in free
format:
<chip-id> <snp-id> <code for genotype call>
[<confidence>] ...
Currently, any fields following the first three (or four) are
ignored. If the argument sorted
is TRUE
, the file is
assumed to be sorted
with snp-id as primary key and
chip-id as secondary key using the current locale. The rows and
columns of the returned matrix will also be ordered in this manner. If
sorted
is set to FALSE
, then an algorithm which avoids
this assumption is used. The rows and columns of the returned matrix
will then be in the same order as the input chip_id
and
snp_id
vectors. Calls in which both id fields match elements in the
chip.id
and
snp.id
arguments are read in, after (optionally) checking that
the level of confidence achieves a given threshold.
Confidence level checking is
controlled by the conf
argument. conf=0
indicates that
no confidence score is present and no checking is done. conf>0
indicates that calls with scores above threshold
are accepted,
while conf<0
indicates that only calls with scores below
threshold
should be accepted.
The routine is case-sensitive and it is important that the
<chip-id> and <snp-id> match the cases of
chip.id
and snp.id
exactly.
An object of class snp.matrix
.
If more than one instance of any
combination of chip_id
element and snp_id
element
passes the confidence threshold, the called to be used is decided by
the following rules:
Use of sorted=TRUE
is usually discouraged since the alternative
algorithm is safer and, usually, not appreciably slower. However, if
the input file is to be read multiple times and there is a reasonably
close correspondence between cells of the matrix to be returned and
lines of the input file, the sorted option can be faster.
This function has been replaced by the more flexible function
read.snps.long
.
David Clayton david.clayton@cimr.cam.ac.uk and Hin-Tak Leung
http://www-gene.cimr.cam.ac.uk/clayton
snp.matrix-class
, X.snp.matrix-class