bedtools merge book-end merge

I never paid too much attention to this (I should have), but TIL that bedtools merge default is not only to merge overlapping regions but also to merge book-ended regions.

From the bedtools merge manual: -d Maximum distance between features allowed for features to be merged. Default is 0. That is, overlapping and/or book-ended features are merged.

What does this mean? With the default (-d 0) settings, bedtools merge will also merge the following regions:

# Two regions with book-end relationship - they are next to each other like books:
12345678
AAAA
BBBB
# Will get merged with the default bedtools merge -d 0 settings into:
12345678
AAAABBBB

To avoid merging book-ended regions, you have to change the default to -d -1. The example above will not get merged anymore and bedtools merge will merge only truly overlapping regions.

Note: bedtools merge is smart enough to correctly merge VCF files with END info field. This is useful, for example, when working with VCFs with structural variants, such as CNVs. You don’t have to convert the VCF into BED, merge, and convert back to VCFs.

Leave a comment