stdin and stdout decision in Python

The easiest automatic decision between defined input file or stdin and output file or stdout I have come up with in Python is:

import sys

# If intab (input) is defined, read it; if not assume stdin
if intab:
    f = open(intab, "r")
else:
    f = sys.stdin

# If otab (output) is defined, read it; if not assume stdout
if otab:
    fout = open(otab, "w")
else:
    fout = sys.stdout

# Do whatever you want 
for line in f:
    print(line)
    write.fout(line)

# And close files, just to be safe
if f is not sys.stdin:
    f.close()

if fout is not sys.stdout:
    fout.close()

There might be something easier but this is simple and obvious.

Count number of mappings (get unique and multi mappings)

I have always strugled with unique mapping filtering. This is mainly because every aligner/mapper marks unique mappings differently. Some aligners assign a specific MAPQ (STAR255), some assign a SAM tag (bwa alnXT:A:U), some just say if the alignment is likely to be unique or not (bwa memMAPQ 0 = definitely not unique, MAPQ > 0 more likely unique as the MAPQ increases).

Although many people do not support the concept of unique alignment I am a simple man and I like to know the number of mappings per reads. Especially with the long-read sequencing where we might want to exclude reads which have a strong support to be mapped to a single position from those that could be mapped to multiple regions.

To make this simple and aligner/mapped independent the solution is quite simple – count how many time we see a single read (by name) mapped in a SAM file. To solve this I made a very simple Python script which can do this counting and adds the number of alignments in a new SAM tag.

sam-counts-maps.py (very beta version) counts number of alignments per read name. So far it cannot handle paired-end reads and chimeric reads but I would love to extend it soon.

The use is rather simple:

$ samtools view -h test.bam | python sam-counts-maps.py --tag X0 | samtools view -bh - | samtools sort - > out.bam

The script cannot handle BAM files because I didn’t want to use pysam to make it more compatiable (and I don’t like pysam to read/write SAM/BAM because it likes to modify the files randomly).

The full options are:

usage: sam-count-maps.py [-h] [-i INPUT] [-o OUTPUT] [-t TAG] [-w]

Count number of mappings per read.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input SAM file. Default: stdin
  -o OUTPUT, --output OUTPUT
                        Output SAM file. Default: stdout
  -t TAG, --tag TAG     New tag with number of mappings to be added. Default: X0.
  -w, --overwrite       Do you want to overwrite if --tag already exists?
                        Default: Don't overwrite.

Then, you can just grep for number of alignments with the added tag. For example, to get unique alignments:

samtools view -h out.bam | grep -w -E "^@[A-Z][A-Z]|X0:i:1" | samtools view -bh - > unique.bam