RangedData-class {IRanges}R Documentation

Data on ranges

Description

RangedData supports storing data, i.e. a set of variables, on a set of ranges spanning multiple spaces (e.g. chromosomes). Although the data is split across spaces, it can still be treated as one cohesive dataset when desired. In order to handle large datasets, the data values are stored externally to avoid copying, and the rdapply function facilitates the processing of each space separately (divide and conquer).

Details

A RangedData object consists of two primary components: a RangesList holding the ranges over multiple spaces and a parallel SplitXDataFrame, holding the split data. There is also an annotation slot for denoting the source (e.g. the genome) of the ranges and/or data.

There are two different modes of interacting with a RangedData. The first mode treats the object as a contiguous "data frame" annotated with range information. The accessors start, end, and width get the corresponding fields in the ranges as atomic integer vectors, undoing the division over the spaces. The [[ and matrix-style [, extraction and subsetting functions unroll the data in the same way. [[<- does the inverse. The number of rows is defined as the total number of ranges and the number of columns is the number of variables in the data. It is often convenient and natural to treat the data this way, at least when the data is small and there is no need to distinguish the ranges by their space.

The other mode is to treat the RangedData as a list, with an element (a virtual Ranges/XDataFrame pair) for each space. The length of the object is defined as the number of spaces and the value returned by the names accessor gives the names of the spaces. The list-style [ subset function behaves analogously. The rdapply function provides a convenient and formal means of applying an operation over the spaces separately. This mode is helpful when ranges from different spaces must be treated separately or when the data is too large to process over all spaces at once.

Accesor methods

In the code snippets below, x is a RangedData object.

The following accessors treat the data as a contiguous dataset, ignoring the division into spaces:

Array accessors:
nrow(x): The number of ranges in x.
ncol(x): The number of data variables in x.
dim(x): An integer vector of length two, essentially c(nrow(x), ncol(x)).
rownames(x): Gets the names of the ranges in x.
colnames(x): Gets the names of the variables in x.
dimnames(x): A list with two elements, essentially list(rownames(x), colnames(x)).
Range accessors. The type of the return value depends on the type of Ranges. For IRanges, an integer vector. Regardless, the number of elements is always equal to nrow(x).
start(x): The start value of each range.
width(x): The width of each range.
end(x): The end value of each range.

These accessors make the object seem like a list along the spaces:

length(x): The number of spaces (e.g. chromosomes) in x.
names(x): The names of the spaces (e.g. "chr1"). NULL or a character vector of the same length as x.
names(x) <- value: Set the names of the spaces, where value is either NULL or a character vector of the same length as x.

Other accessors:

annotation(object): Here, object is a RangedData object. Get the scalar string identifying the source of the data in some way (e.g. genome, experimental platform, etc).
ranges(x): Gets the ranges in x as a RangesList.
values(x): Gets the data values in x as a SplitXDataFrame.

Constructor

RangedData(ranges = IRanges(), ..., splitter = NULL, annotation = NULL): Creates a RangedData with the ranges in ranges and variables given by the arguments in .... See the constructor XDataFrame for how the ... arguments are interpreted. If splitter is NULL, all of the ranges and values are placed into the same space, resulting in a single-space (length one) RangedData. Otherwise, the ranges and values are split into spaces according to splitter, which is treated as a factor, like the f argument in split. The annotation may be specified as a scalar string by the annotation argument.

Coercion

as.data.frame(x, row.names=NULL, optional=FALSE, ...): Copy the start, end, width of the ranges and all of the variables as columns in a data.frame. This is a bridge to existing functionality in R, but of course care must be taken if the data is large. Note that optional and ... are ignored.
as(from, "XDataFrame"): Like as.data.frame above, except the result is an XDataFrame and it probably involves less copying, especially if there is only a single space.
as(from, "RangedData"): coerces from to a RangedData, according to its class:
XRle
The bounds of the runs become the ranges and the values become a column named score.

Subsetting and Replacement

In the code snippets below, x is a RangedData object.

x[i]: Subsets x by indexing into its spaces, so the result is of the same class, with a different set of spaces. i can be numerical, logical, NULL or missing.
x[i,j]: Subsets x by indexing into its rows and columns. The result is of the same class, with a different set of rows and columns. Note that this differs from the subset form above, because we are now treating x as one contiguous dataset.
x[[i]]: Extracts a variable from x, where i can be a character, numeric, or logical scalar that indexes into the columns. The variable is unlisted over the spaces.
x[[i]] <- value: Sets value as column i in x, where i can be a character, numeric, or logical scalar that indexes into the columns. The length of value should equal nrow(x). x[[i]] should be identical to value after this operation.

Splitting and Combining

In the code snippets below, x is a RangedData object.

split(x, f, drop = FALSE): Split x according to f, which should be of length equal to nrow(x). Note that drop is ignored here. The result is a RangedDataList where every element has the same length (number of spaces) but different sets of ranges within each space.
c(x, ..., recursive = FALSE): Combines x with arguments specified in ..., which must all be RangedData instances. This combination acts as if x is a list of spaces, meaning that the result will contain the spaces of the first concatenated with the spaces of the second, and so on. This function is useful when creating RangedData instances on a space-by-space basis and then needing to combine them.

Author(s)

Michael Lawrence

See Also

RangedData-utils for utlities and the rdapply function for applying a function to each space separately.

Examples

  ranges <- IRanges(c(1,2,3),c(4,5,6))
  filter <- c(1L, 0L, 1L)
  score <- c(10L, 2L, NA)

  ## constructing RangedData instances

  ## no variables
  rd <- RangedData()
  rd <- RangedData(ranges)
  ranges(rd)
  ## one variable
  rd <- RangedData(ranges, score)
  rd[["score"]]
  ## multiple variables
  rd <- RangedData(ranges, filter, vals = score)
  rd[["vals"]] # same as rd[["score"]] above
  rd[["filter"]]
  rd <- RangedData(ranges, score + score)
  rd[["score...score"]] # names made valid
  ## use an annotation
  rd <- RangedData(ranges, annotation = "hg18")
  annotation(rd)

  ## split some data over chromosomes

  range2 <- IRanges(start=c(15,45,20,1), end=c(15,100,80,5))
  both <- c(ranges, range2)
  score <- c(score, c(0L, 3L, NA, 22L))
  filter <- c(filter, c(0L, 1L, NA, 0L)) 
  chrom <- paste("chr", rep(c(1,2), c(length(ranges), length(range2))), sep="")

  rd <- RangedData(both, score, filter, splitter = chrom, annotation = "hg18")
  rd[["score"]] # identical to score
  rd[1][["score"]] # identical to score[1:3]

  ## subsetting

  ## list style: [i]

  rd[numeric()] # these three are all empty
  rd[logical()]
  rd[NULL]
  rd[] # missing, full instance returned
  rd[FALSE] # logical, supports recycling
  rd[c(FALSE, FALSE)] # same as above
  rd[TRUE] # like rd[]
  rd[c(TRUE, FALSE)]
  rd[1] # numeric index
  rd[c(1,2)]
  rd[-2]

  ## matrix style: [i,j]

  rd[,NULL] # no columns
  rd[NULL,] # no rows
  rd[,1]
  rd[,1:2]
  rd[,"filter"]
  rd[1,] # now by the rows
  rd[c(1,3),]
  rd[1:2, 1] # row and column
  rd[c(1:2,1,3),1] ## repeating rows

  ## variable replacement

  count <- c(1L, 0L, 2L)
  rd <- RangedData(ranges, count, splitter = c(1, 2, 1))
  ## adding a variable
  score <- c(10L, 2L, NA)
  rd[["score"]] <- score
  rd[["score"]] # same as 'score'
  ## replacing a variable
  count2 <- c(1L, 1L, 0L)
  rd[["count"]] <- count2
  ## numeric index also supported
  rd[[2]] <- score
  rd[[2]] # gets 'score'
  ## removing a variable
  rd[[2]] <- NULL
  ncol(rd) # is only 1

  ## combining/splitting

  rd <- RangedData(ranges, score, splitter = c(1, 2, 1))
  c(rd[1], rd[2]) # equal to 'rd'
  rd2 <- RangedData(ranges, score)
  unlist(split(rd2, c(1, 2, 1))) # same as 'rd'

[Package IRanges version 1.0.16 Index]