countGenomicOverlaps {GenomicRanges} | R Documentation |
Count read hits per exon or transcript and resolve multi-hit reads.
## S4 method for signature 'GRangesList,GRangesList' countGenomicOverlaps( query, subject, type = c("any", "start", "end", "within", "equal"), resolution = c("none", "divide", "uniqueDisjoint"), ignore.strand = FALSE, splitreads = TRUE, ...)
query |
A GRangesList, or a GRanges of genomic features.
These are the annotations that define the genomic regions and will often
be the result of calling "exonsBy" or "transcriptsBy" on a
TranscriptDb object. If a GRangesList is
provided, each top level of the list represents a "super" such as a gene
and each row is a "sub" such as an exon or transcript. When |
subject |
A GRangesList, GRanges, or GappedAlignments representing the data (e.g., reads). List structures as the |
type |
See |
resolution |
A
|
ignore.strand |
A logical value indicating if strand should be considered when matching. |
splitreads |
A logical value indicating if split reads should be included. |
... |
Additional arguments, perhaps used by methods defined on this generic. |
The countGenomicOverlaps
methods use the findOverlaps
function in conjunction with a resolution method to identify overlaps
and resolve subjects (reads) that match multiple queries (annotation regions).
The usual type
argument of findOverlaps
is used to specify the type of overlap. The resolution
argument is used to select a method to resolve the conflict
when a subject hits more than 1 query. Here the
term ‘hit’ means an overlap identified by findOverlaps
.
The primary difference in the handling of split reads vs
simple reads (i.e., no gap in the CIGAR) is the portion of
the read hit each split read fragment has to contribute.
All reads, whether simple or split, have an overall value
of 1 to contribute to a query they hit. In the case of the
split reads, this value is further divided by the number of
fragments in the read. For example, if a split read has 3
fragments (i.e., two gaps in the CIGAR) each
fragment has a value of 1/3 to contribute to the query
they hit. As with the simple reads, depending upon the
resolution
chosen the value may be divided, fully
assigned or discarded.
More detailed examples can be found in the countGenomicOverlaps
vignette.
A vector of counts
Valerie Obenchain and Martin Morgan
## Not run: rng1 <- function(s, w) GRanges(seq="chr1", IRanges(s, width=w), strand="+") rng2 <- function(s, w) GRanges(seq="chr2", IRanges(s, width=w), strand="+") query <- GRangesList(A=rng1(1000, 500), B=rng2(2000, 900), C=rng1(c(3000, 3600), c(500, 300)), D=rng2(c(7000, 7500), c(600, 300)), E1=rng1(4000, 500), E2=rng1(c(4300, 4500), c(400, 400)), F=rng2(3000, 500), G=rng1(c(5000, 5600), c(500, 300)), H1=rng1(6000, 500), H2=rng1(6600, 400)) subj <- GRangesList(a=rng1(1400, 500), b=rng2(2700, 100), c=rng1(3400, 300), d=rng2(7100, 600), e=rng1(4200, 500), f=rng2(c(3100, 3300), 50), g=rng1(c(5400, 5600), 50), h=rng1(c(6400, 6600), 50)) ## Overlap type = "any" none <- countGenomicOverlaps(query, subj, type="any", resolution="none") divide <- countGenomicOverlaps(query, subj, type="any", resolution="divide") uniqueDisjoint <- countGenomicOverlaps(query, subj, type="any", resolution="uniqueDisjoint") data.frame(none = none, divide = divide, uniqDisj = uniqueDisjoint) ## Split read with 4 fragments : splitreads <- GRangesList(c(rng1(c(3000, 3200, 4000), 100), rng1(5400, 300))) ## Unlist both the splitreads and the query to see ## - read fragments 1 and 2 both hit query 3 ## - read fragment 3 hits query 7 ## - read fragment 4 hits query 11 and 12 findOverlaps(unlist(query), unlist(splitreads)) ## Use countGenomicOverlaps to avoid double counting. ## Because this read has 4 parts each part contributes a count of 1/4. ## When resolution="none" only reads that hit a single region are counted. split_none <- countGenomicOverlaps(query, splitreads, type="any", resolution="none") ## When resolution="divide" all reads are counted by dividing their count ## evenly between the regions they hit. Region 3 of the query was hit ## by two reads each contributing a count of 1/4. Region 7 was hit ## by one read contributing a count of 1/4. Regions 11 and 12 were both ## hit by the same read resulting in having to share (i.e., "divide") the ## single 1/4 hit read 4 had to contribute. split_divide <- countGenomicOverlaps(query, splitreads, type="any", resolution="divide") data.frame(none = split_none, divide = split_divide) ## End(Not run)