I have PARE-Seq and RIBO-Seq experiments where I really care about the exact position of the first mapped nucleotide. The analysis I do relies on the great bedtools bamtobed
. I was just wondering what happens if the aligner, by any chance, reports a mismatch (1X
), an insertion (1I
), or a softclipping (1S
) at the mapped 5′ end.
Example SAM:
$ samtools view in.bam
match 256 NC_001133.9 549 0 26M * 0 0 TTGCATCAGCGGTCTATACCCTGTGC :?@D+4=2=20@<EHIIIIIIIGEHI
oneInser 256 NC_001133.9 549 0 1I25M * 0 0 TTGCATCAGCGGTCTATACCCTGTGC :?@D+4=2=20@<EHIIIIIIIGEHI
oneMism 256 NC_001133.9 549 0 1X25M * 0 0 TTGCATCAGCGGTCTATACCCTGTGC :?@D+4=2=20@<EHIIIIIIIGEHI
oneSoft 256 NC_001133.9 549 0 1S25M * 0 0 TTGCATCAGCGGTCTATACCCTGTGC :?@D+4=2=20@<EHIIIIIIIGEHI
will result in:
$ bedtools bamtobed -i in.bam
NC_001133.9 548 574 match 0 +
NC_001133.9 548 573 oneInser 0 +
NC_001133.9 548 574 oneMism 0 +
NC_001133.9 548 573 oneSoft 0 +
Insertion and softclipping are handled properly, and they are not being reported as a match. However, mismatch (1X
) at the 5′ end is being reported as a match. It makes sense inside of the read, but it doesn’t make sense at the ends (either 5′ or 3′ mapped end). So be careful and check your mappings.