regionOverlap {Ringo}R Documentation

Function to compute overlap of genomic regions

Description

Given two data frames of genomic regions, this function computes the base-pair overlap, if any, between every pair of regions from the two lists.

Usage

regionOverlap(xdf, ydf, chrColumn = "chr", startColumn = "start",
endColumn = "end", mem.limit=1e8)

Arguments

xdf data.frame that holds the first set of genomic regions
ydf data.frame that holds the first set of genomic regions
chrColumn character; what is the name of the column that holds the chromosome name of the regions in xdf and ydf
startColumn character; what is the name of the column that holds the start position of the regions in xdf and ydf
endColumn character; what is the name of the column that holds the start position of the regions in xdf and ydf
mem.limit integer value; what is the maximal allowed size of matrices during the computation

Value

Originally, a matrix with nrow(xdf) rows and nrow(ydf) columns, in which entry X[i,j] specifies the length of the overlap between region i of the first list (xdf) and region j of the second list (ydf). Since this matrix is very sparse, we use the matrix.csr representation from the SparseM package for it.

Note

The function only return the absolute length of overlapping regions in base-pairs. It does not return the position of the overlap or the fraction of region 1 and/or region 2 that overlaps the other regions.

The argument mem.limit is not really a limit to used RAM, but rather the maximal size of matrices that should be allowed during the computation. If larger matrices would arise, the second regions list is split into parts and the overlap with the first list is computed for each part. During computation, matrices of size nrow(xdf) times nrow(ydf) are created.

Author(s)

Joern Toedling toedling@ebi.ac.uk

See Also

matrix.csr-class

Examples

  ## toy example:
  regionsH3ac <- data.frame(chr=c("chr1","chr7","chr8","chr1","chrX","chr8"), start=c(100,100,100,510,100,60), end=c(200, 200, 200,520,200,80))
  regionsH4ac <- data.frame(chr=c("chr1","chr2","chr7","chr8","chr9"),
start=c(500,100,50,80,100), end=c(700, 200, 250, 120,200))

  ## compare the regions first by eye
  ##  which ones do overlap and by what amount?
  regionsH3ac
  regionsH4ac

  ## compare it to the result:
  as.matrix(regionOverlap(regionsH3ac, regionsH4ac))
  whichCsr(regionOverlap(regionsH3ac, regionsH4ac))

[Package Ringo version 1.6.0 Index]