Hi everyone, I want to integrate some nf-core pipelines into my local Galaxy. I’m particularly interested in wrapping the nf-core/ampliseq pipeline by using docker.
I was wondering if anyone managed to successfully write a proper XML file for this or any other pipeline and if you could share your experience or any tips you might have. It would be really helpful as I work on this.
For simple NF workflows that will be widely used, reimplementing as a new Galaxy workflow has many advantages including making your work available to the community in a computationally efficient form, with all the outputs available for downstream processing in Galaxy, so that’s recommended if possible and is usually easy if all the tools are already available.
If new tools are needed, that requires specialised developer effort. There is substantial ROI if the new workflow is important to the community, but the tools have to be available. Ampliseq looks very complex, requiring a collaborative community-led effort to implement something equivalent or better.
If computational efficiency is not an issue, the whole NF kit and kaboodle can be wrapped as a new Galaxy tool because nextflow is in Conda. There’s a very crude example here where the data are all URI in the user supplied yaml file and the NF workflow has a special python runner to set up the actual NF command line. That runner could provide a model for NF workflows without one.
Main benefits
compared to running nextflow on a command line, the results are identical, because Galaxy really just runs it on a command line.
when the NF workflow is updated, there is minimal effort to update the tool
This approach has many, many problems and is rarely the right solution
The tool runs the entire workflow as a single job
misses out on all of Galaxy’s workflow management benefits.
requires the maximum RAM/CPU required for any subworkflow (!) for the entire duration.
Depending on the workflow, that may be a terrible waste.
The most tangible benefit is having outputs in Galaxy for downstream processing.
On a private Galaxy, it may be a solution that requires far less skilled effort than a conversion.
The computational inefficiency would be an unacceptable burden for public Galaxy servers.