Bundle pigz in containers

Hi! I’m analyzing a lot of long-read data using tools like Porechop and NanoLyse in nf-core pipelines (taxprofiler, mag) but there is a huge slow-down in using the BusyBox gzip to compress ~100GB files. For example, gzip compression alone can take an extra 6 hours on top of an 8 minute(!) NanoLyse job.

If pigz was bundled, multithreading would be possible (Porechop is already built to prefer pigz if it’s available), and it would cut down on that huge time cost. I am sure many other tools/pipelines would be able to use it too!

What do you guys think? Is this possible? …

Hi @Alex_Caswell

Is this a question for Galaxy development? Have you ported the nf-core workflows over into a Galaxy workflow? Thanks for clarifying! :slight_smile:

This is a question about nf-core workflows, but I understand that Galaxy maintains the Docker/Singularity images that are used by nf-core.. Is that right? Sorry if it’s off-topic here, I’m trying to find out where to ask!! :sweat_smile:

This is a question about nf-core workflows, but I understand that Galaxy maintains the Docker/Singularity images that are used by nf-core.. Is that right? Sorry if it’s off-topic here, I’m trying to find out where to ask!!

That is an interesting view on it :slight_smile:
There is a project called Biocontainers, where we maintain containers for all different workflow engines. It’s unfortunately true that only Galaxy people are maintaining this for everyone, we are also trying to mirror and store those containers, but it does not need to be just Galaxy :slight_smile:

Back to your question, you can create a container like here add porechop and pigz by bgruening · Pull Request #3505 · BioContainers/multi-package-containers · GitHub, but you would still need to adopt your pipeline.

1 Like