daglad {GLAD}R Documentation

Analysis of array CGH data

Description

This function allows the detection of breakpoints in genomic profiles obtained by array CGH technology and affects a status (gain, normal or lost) to each clone.

Usage


## S3 method for class 'profileCGH':
daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
                  smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
                  qlambda=0.999,  bandwidth=10, sigma=NULL, base=FALSE, round=1.5,
                  lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=5,
                  method="centroid", nmin=1, nmax=8,
                  amplicon=1, deletion=-5, deltaN=0.10,  forceGL=c(-0.15,0.15), nbsigma=3,
                  MinBkpWeight=0.35, CheckBkpPos=TRUE, assignGNLOut=TRUE,
                  verbose=FALSE, ...)

Arguments

profileCGH Object of class profileCGH
mediancenter If TRUE, LogRatio are center on their median.
genomestep If TRUE, a smoothing step over the whole genome is performed and a "clustering throughout the genome" allows to identify a cluster corresponding to the Normal DNA level. The threshold used in the daglad function (deltaN, forceGL, amplicon, deletion) and then compared to the median of this cluster.
normalrefcenter If TRUE, the LogRatio are centered through the median of the cluster identified during the genomestep.
smoothfunc Type of algorithm used to smooth LogRatio by a piecewise constant function. Choose either aws or laws.
lkern lkern determines the location kernel to be used (see laws for details).
model model determines the distribution type of LogRatio (see laws for details).
qlambda qlambda determines the scale parameter qlambda for the stochastic penalty (see laws for details).
base If TRUE, the position of clone is the physical position onto the chromosome, otherwise the rank position is used.
sigma Value to be passed to either argument sigma2 of aws function or shape of laws. If NULL, sigma is calculated from the data.
bandwidth Set the maximal bandwidth hmax in the aws or laws function. For example, if bandwidth=10 then the hmax value is set to 10*X_N where X_N is the position of the last clone.
round The smoothing results of either aws or laws function are rounded or not depending on the round argument. The round value is passed to the argument digits of the round function.
lambdabreak Penalty term (λ') used during the "Optimization of the number of breakpoints" step.
lambdaclusterGen Penalty term (λ*) used during the "clustering throughout the genome" step.
param Parameter of kernel used in the penalty term.
alpha Risk alpha used for the "Outlier detection" step.
msize The outliers MAD are calculated on regions with a cardinality greater or equal to msize.
method The agglomeration method to be used during the "clustering throughout the genome" steps.
nmin Minimum number of clusters (N*max) allowed during the "clustering throughout the genome" clustering step.
nmax Maximum number of clusters (N*max) allowed during the "clustering throughout the genome" clustering step.
amplicon Level (and outliers) with a smoothing value (log-ratio value) greater than this threshold are consider as amplicon. Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step.
deletion Level (and outliers) with a smoothing value (log-ratio value) lower than this threshold are consider as deletion. Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step.
deltaN Region with smoothing values in between the interval [-deltaN,+deltaN] are supposed to be normal.
forceGL Level with smoothing value greater (lower) than rangeGL[1] (rangeGL[2]) are considered as gain (lost). Note that first, the data are centered on the normal reference value computed during the "clustering throughout the genome" step.
nbsigma For each breakpoints, a weight is calculated which is a function of absolute value of the Gap between the smoothing values of the two consecutive regions. Weight = 1- kernelpen(abs(Gap),param=c(d=nbsigma*Sigma)) where Sigma is the standard deviation of the LogRatio.
MinBkpWeight Breakpoints which GNLchange==0 and Weight less than MinBkpWeight are discarded.
CheckBkpPos If TRUE, the accuracy position of each breakpoints is checked.
assignGNLOut If FALSE the status (gain/normal/loss) is not assigned for outliers.
verbose If TRUE some information are printed.
...

Details

The function daglad implements a slightly modified version of the methodology described in the article : Analysis of array CGH data: from signal ratio to gain and loss of DNA regions (Hupé et al., Bioinformatics 2004 20(18):3413-3422). The daglad function allows to choose some threshold to help the algorithm to identify the status of the genomic regions. The threshodls are given in the following parameters:

Value

An object of class "profileCGH" with the following attributes:
profileValues a data.frame with the following added information:

    Smoothing
    The smoothing values correspond to the median of each Level

    Breakpoints
    The last position of a region with identical amount of DNA is flagged by 1 otherwise it is 0. Note that during the "Optimization of the number of breakpoints" step, removed breakpoints are flagged by -1.

    Level
    Each position with equal smoothing value are labelled the same way with an integer value starting from one. The label is incremented by one when a new level occurs or when moving to the next chromosome.

    OutliersAws
    Each AWS outliers are flagged by -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0.

    OutliersMad
    Each MAD outliers are flagged by -1 (if it is in the α/2 lower tail of the distribution) or 1 (if it is in the α/2 upper tail of the distribution) otherwise it is 0.

    OutliersTot
    OutliersAws + OutliersMad.

    NormalRef
    Clusters which have been used to set the normal reference during the "clustering throughout the genome" step are code by 0. Note that if genomestep=FALSE, all the value are set to 0.

    ZoneGNL
    Status of each clone: Gain is coded by 1, Loss by -1, Amplicon by 2, deletion by -10 and Normal by 0.


BkpInfo a data.frame sum up the information for each breakpoint:
    Chromosome
    Chromosome name.
    Smoothing
    Smoothing value for the breakpoint.
    Gap
    absolute value of the gap between the smoothing values of the two consecutive regions.
    Sigma
    The estimation of the standard-deviation of the chromosome.
    Weight
    1 - kernelpen(Gap, type, param=c(d=nbsigma*Sigma))
    ZoneGNL
    Status of the level where is the breakpoint.
    GNLchange
    Takes the value 1 if the ZoneGNL of the two consecutive regions are different.
    LogRatio
    Test over Reference log-ratio.

NormalRef If genomestep=TRUE and normalrefcenter=FALSE, then NormalRef is the median of the cluster which has been used to set the normal reference during the "clustering throughout the genome" step. Otherwise NormalRef is 0.

Note

People interested in tools dealing with array CGH analysis can visit our web-page http://bioinfo.curie.fr.

Author(s)

Philippe Hupé, glad@curie.fr.

See Also

glad.

Examples


data(snijders)
gm13330$Clone <- gm13330$BAC
profileCGH <- as.profileCGH(gm13330)

###########################################################
###
###  daglad function
###
###########################################################

res <- daglad(profileCGH, mediancenter=FALSE, normalrefcenter=FALSE, genomestep=FALSE,
              smoothfunc="lawsglad", lkern="Exponential", model="Gaussian",
              qlambda=0.999,  bandwidth=10, base=FALSE, round=1.5,
              lambdabreak=8, lambdaclusterGen=40, param=c(d=6), alpha=0.001, msize=5,
              method="centroid", nmin=1, nmax=8,
              amplicon=1, deletion=-5, deltaN=0.10,  forceGL=c(-0.15,0.15), nbsigma=3,
              MinBkpWeight=0.35, CheckBkpPos=TRUE)

### Genomic profile on the whole genome
plotProfile(res, unit=3, Bkp=TRUE, labels=FALSE, Smoothing="Smoothing",
main="Breakpoints detection: DAGLAD analysis")


###Genomic profile for chromosome 1
plotProfile(res, unit=3, Bkp=TRUE, labels=TRUE, Chromosome=1,
Smoothing="Smoothing", main="Chromosome 1: DAGLAD analysis")

### The standard-deviation of LogRatio are:
res$SigmaC

### The list of breakpoints is:
res$BkpInfo


[Package GLAD version 1.18.0 Index]