I have always strugled with unique mapping filtering. This is mainly because every aligner/mapper marks unique mappings differently. Some aligners assign a specific MAPQ (STAR – 255), some assign a SAM tag (bwa aln – XT:A:U), some just say if the alignment is likely to be unique or not (bwa mem – MAPQ 0 = definitely not unique, MAPQ > 0 more likely unique as the MAPQ increases).
Although many people do not support the concept of unique alignment I am a simple man and I like to know the number of mappings per reads. Especially with the long-read sequencing where we might want to exclude reads which have a strong support to be mapped to a single position from those that could be mapped to multiple regions.
To make this simple and aligner/mapped independent the solution is quite simple – count how many time we see a single read (by name) mapped in a SAM file. To solve this I made a very simple Python script which can do this counting and adds the number of alignments in a new SAM tag.
sam-counts-maps.py (very beta version) counts number of alignments per read name. So far it cannot handle paired-end reads and chimeric reads but I would love to extend it soon.
The use is rather simple:
$ samtools view -h test.bam | python sam-counts-maps.py --tag X0 | samtools view -bh - | samtools sort - > out.bam
The script cannot handle BAM files because I didn’t want to use pysam to make it more compatiable (and I don’t like pysam to read/write SAM/BAM because it likes to modify the files randomly).
The full options are:
usage: sam-count-maps.py [-h] [-i INPUT] [-o OUTPUT] [-t TAG] [-w]
Count number of mappings per read.
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input SAM file. Default: stdin
-o OUTPUT, --output OUTPUT
Output SAM file. Default: stdout
-t TAG, --tag TAG New tag with number of mappings to be added. Default: X0.
-w, --overwrite Do you want to overwrite if --tag already exists?
Default: Don't overwrite.
Then, you can just grep for number of alignments with the added tag. For example, to get unique alignments:
samtools view -h out.bam | grep -w -E "^@[A-Z][A-Z]|X0:i:1" | samtools view -bh - > unique.bam