I never paid too much attention to this (I should have), but TIL that bedtools merge
default is not only to merge overlapping regions but also to merge book-ended regions.
From the bedtools merge
manual: -d Maximum distance between features allowed for features to be merged. Default is 0. That is, overlapping and/or book-ended features are merged.
What does this mean? With the default (-d 0)
settings, bedtools merge
will also merge the following regions:
# Two regions with book-end relationship - they are next to each other like books:
12345678
AAAA
BBBB
# Will get merged with the default bedtools merge -d 0 settings into:
12345678
AAAABBBB
To avoid merging book-ended regions, you have to change the default to -d -1
. The example above will not get merged anymore and bedtools merge
will merge only truly overlapping regions.
Note: bedtools merge
is smart enough to correctly merge VCF files with END
info field. This is useful, for example, when working with VCFs with structural variants, such as CNVs. You don’t have to convert the VCF into BED, merge, and convert back to VCFs.