findMateAlignment {Rsamtools}R Documentation

Pairing the elements of a GappedAlignments object

Description

Utilities for pairing the elements of a GappedAlignments object.

Usage

findMateAlignment(x, verbose=FALSE)
makeGappedAlignmentPairs(x, use.names=FALSE, use.mcols=FALSE)

## Related low-level utilities:
getDumpedAlignments()
countDumpedAlignments()
flushDumpedAlignments()

Arguments

x

A named GappedAlignments object with metadata columns flag, mrnm, and mpos. Typically obtained by loading aligned paired-end reads from a BAM file with:

    param <- ScanBamParam(what=c("flag", "mrnm", "mpos"))
    x <- readBamGappedAlignments(..., use.names=TRUE, param=param)
    
verbose

If TRUE, then findMateAlignment will print some details about what is currently going on. Mostly useful for debugging.

use.names

Whether the names on the input object should be propagated to the returned object or not.

use.mcols

Names of the metadata columns to propagate to the returned GappedAlignmentPairs object.

Details

Pairing algorithm used by findMateAlignment

findMateAlignment is the power horse used by higher-level functions like makeGappedAlignmentPairs and readBamGappedAlignmentPairs for pairing the records loaded from a BAM file containing aligned paired-end reads.

It implements the following pairing algorithm:

Ambiguous pairing

The above algorithm will find almost all pairs unambiguously, even when the same pair of reads maps to several places in the genome. Note that, when a given pair maps to a single place in the genome, looking at (A) is enough to pair the 2 corresponding records. The additional conditions (B), (C), (D), (E), (F), and (G), are only here to help in the situation where more than 2 records share the same QNAME. And that works most of the times. Unfortunately there are still situations where this is not enough to solve the pairing problem unambiguously.

For example, here are 4 records (loaded in a GappedAlignments object) that cannot be paired with the above algorithm:

Showing the 4 records as a GappedAlignments object of length 4:

GappedAlignments with 4 alignments and 2 metadata columns:
                    seqnames strand       cigar    qwidth     start    end
                       <Rle>  <Rle> <character> <integer> <integer> <integer>
  SRR031714.2658602    chr2R      +  21M384N16M        37   6983850 6984270
  SRR031714.2658602    chr2R      +  21M384N16M        37   6983850 6984270
  SRR031714.2658602    chr2R      -  13M372N24M        37   6983858 6984266
  SRR031714.2658602    chr2R      -  13M378N24M        37   6983858 6984272
                        width      ngap |     mrnm      mpos
                    <integer> <integer> | <factor> <integer>
  SRR031714.2658602       421         1 |    chr2R   6983858
  SRR031714.2658602       421         1 |    chr2R   6983858
  SRR031714.2658602       409         1 |    chr2R   6983850
  SRR031714.2658602       415         1 |    chr2R   6983850
Note that the BAM fields show up in the following columns:

As you can see, the aligner has aligned the same pair to the same location twice! The only difference between the 2 aligned pairs is in the CIGAR i.e. one end of the pair is aligned twice to the same location with exactly the same CIGAR while the other end of the pair is aligned twice to the same location but with slightly different CIGARs.

Now showing the corresponding flag bits:

     isPaired isProperPair isUnmappedQuery hasUnmappedMate isMinusStrand
[1,]        1            1               0               0             0
[2,]        1            1               0               0             0
[3,]        1            1               0               0             1
[4,]        1            1               0               0             1
     isMateMinusStrand isFirstMateRead isSecondMateRead isNotPrimaryRead
[1,]                 1               0                1                0
[2,]                 1               0                1                0
[3,]                 0               1                0                0
[4,]                 0               1                0                0
     isNotPassingQualityControls isDuplicate
[1,]                           0           0
[2,]                           0           0
[3,]                           0           0
[4,]                           0           0
As you can see, rec(1) and rec(2) are second mates, rec(3) and rec(4) are both first mates. But looking at (A), (B), (C), (D), (E), (F), and (G), the pairs could be rec(1) <-> rec(3) and rec(2) <-> rec(4), or they could be rec(1) <-> rec(4) and rec(2) <-> rec(3). There is no way to disambiguate! So findMateAlignment is just ignoring (with a warning) those alignments with ambiguous pairing, and dumping them in a place from which they can be retrieved later (i.e. after findMateAlignment has returned) for further examination (see "Dumped alignments" subsection below for the details). In other words, alignments that cannot be paired unambiguously are not paired at all. Concretely, this means that readGappedAlignmentPairs is guaranteed to return a GappedAlignmentPairs object where every pair was formed in an non-ambiguous way. Note that, in practice, this approach doesn't seem to leave aside a lot of records because ambiguous pairing events seem pretty rare.

Dumped alignments

Alignments with ambiguous pairing are dumped in a place ("the dump environment") from which they can be retrieved with getDumpedAlignments() after findMateAlignment has returned.

Two additional utilities are provided for manipulation of the dumped alignments: countDumpedAlignments for counting them (a fast equivalent to length(getDumpedAlignments())), and flushDumpedAlignments to flush "the dump environment". Note that "the dump environment" is automatically flushed at the beginning of a call to findMateAlignment.

Value

For findMateAlignment: An integer vector of the same length as x, containing only positive or NA values, where the i-th element is interpreted as follow:

For makeGappedAlignmentPairs: A GappedAlignmentPairs object where the pairs are formed internally by calling findMateAlignment on x.

For getDumpedAlignments: NULL or a GappedAlignments object containing the dumped alignments. See "Dumped alignments" subsection in the "Details" section above for the details.

For countDumpedAlignments: The number of dumped alignments.

Nothing for flushDumpedAlignments.

Author(s)

H. Pages

See Also

GappedAlignments-class, GappedAlignmentPairs-class, readBamGappedAlignments, readBamGappedAlignmentPairs

Examples

bamfile <- system.file("extdata", "ex1.bam", package="Rsamtools",
                       mustWork=TRUE)
param <- ScanBamParam(what=c("flag", "mrnm", "mpos"))
x <- readBamGappedAlignments(bamfile, use.names=TRUE, param=param)
mate <- findMateAlignment(x)
head(mate)
table(is.na(mate))
galp0 <- makeGappedAlignmentPairs(x)
galp <- makeGappedAlignmentPairs(x, use.name=TRUE, use.mcols="flag")
galp
colnames(mcols(galp))
colnames(mcols(first(galp)))
colnames(mcols(last(galp)))

[Package Rsamtools version 1.12.0 Index]