I like using deepTools in my analyses but it might get very slow if you work with big(er) files. Yes, it offers multithreading, but that often doesn’t help and using a lot of threads already killed our server once (it wasn’t me, I swear).
One of the slowest part I have encountered is the computeMatrix module. If your references (-R) are just a couple hundred lines, it will get veeeery slow. After a few days of frustration and waiting to get the matrix I have decided this is not the way to go.
It seems computeMatrix doesn’t scale well with increasing number of reference positions. It turned out to be much more effient to subsample the references, calculate matrices and then merge them together. A simple table of the runtimes looks like this:
| Number of positions in -R | Time / seconds |
| 1,000 | 4 |
| 5,000 | 20 |
| 10,000 | 58 |
| 25,000 | 275 |
shuf -l number_of_reads) for each number of reads and the runtimes were averaged.It seems, that in this experiment, the sweet spot is somewhere around 5,000 positions but I think it might be different in every experiment. Since my references have around 100,000-150,000 positions splitting them to 5,000 chunks doesn’t overload the system much.
This is the final code looks like this: