bedtools bamtobed doesn’t handle mismatch at read end well

I have PARE-Seq and RIBO-Seq experiments where I really care about the exact position of the first mapped nucleotide. The analysis I do relies on the great bedtools bamtobed. I was just wondering what happens if the aligner, by any chance, reports a mismatch (1X), an insertion (1I), or a softclipping (1S) at the mapped 5′ end.

Example SAM:

$ samtools view in.bam
match	256	NC_001133.9	549	0	26M	*	0	0	TTGCATCAGCGGTCTATACCCTGTGC	:?@D+4=2=20@<EHIIIIIIIGEHI
oneInser	256	NC_001133.9	549	0	1I25M	*	0	0	TTGCATCAGCGGTCTATACCCTGTGC	:?@D+4=2=20@<EHIIIIIIIGEHI
oneMism	256	NC_001133.9	549	0	1X25M	*	0	0	TTGCATCAGCGGTCTATACCCTGTGC	:?@D+4=2=20@<EHIIIIIIIGEHI
oneSoft	256	NC_001133.9	549	0	1S25M	*	0	0	TTGCATCAGCGGTCTATACCCTGTGC	:?@D+4=2=20@<EHIIIIIIIGEHI

will result in:

$ bedtools bamtobed -i in.bam
NC_001133.9     548     574     match   0       +
NC_001133.9     548     573     oneInser        0       +
NC_001133.9     548     574     oneMism        0       +
NC_001133.9     548     573     oneSoft 0       +

Insertion and softclipping are handled properly, and they are not being reported as a match. However, mismatch (1X) at the 5′ end is being reported as a match. It makes sense inside of the read, but it doesn’t make sense at the ends (either 5′ or 3′ mapped end). So be careful and check your mappings.

Leave a comment