nf-core Pipeline Integration to Local Galaxy

Hi everyone, I want to integrate some nf-core pipelines into my local Galaxy. I’m particularly interested in wrapping the nf-core/ampliseq pipeline by using docker.

I was wondering if anyone managed to successfully write a proper XML file for this or any other pipeline and if you could share your experience or any tips you might have. It would be really helpful as I work on this.

Thanks in advance!

1 Like

Cross-reference posts for context

Resources

And, this advice is from our senior developer

That’s all just for context in case someone runs across this thread.

If anyone has done this, and wants to share what they did, you are welcome to post here! :slight_smile:

2 Likes

FWIW, some personal views:

For simple NF workflows that will be widely used, reimplementing as a new Galaxy workflow has many advantages including making your work available to the community in a computationally efficient form, with all the outputs available for downstream processing in Galaxy, so that’s recommended if possible and is usually easy if all the tools are already available.

If new tools are needed, that requires specialised developer effort. There is substantial ROI if the new workflow is important to the community, but the tools have to be available. Ampliseq looks very complex, requiring a collaborative community-led effort to implement something equivalent or better.

If computational efficiency is not an issue, the whole NF kit and kaboodle can be wrapped as a new Galaxy tool because nextflow is in Conda. There’s a very crude example here where the data are all URI in the user supplied yaml file and the NF workflow has a special python runner to set up the actual NF command line. That runner could provide a model for NF workflows without one.

Main benefits

  • compared to running nextflow on a command line, the results are identical, because Galaxy really just runs it on a command line.
  • when the NF workflow is updated, there is minimal effort to update the tool

This approach has many, many problems and is rarely the right solution

  • The tool runs the entire workflow as a single job
    • misses out on all of Galaxy’s workflow management benefits.
    • requires the maximum RAM/CPU required for any subworkflow (!) for the entire duration.
    • Depending on the workflow, that may be a terrible waste.
  • The most tangible benefit is having outputs in Galaxy for downstream processing.
  • On a private Galaxy, it may be a solution that requires far less skilled effort than a conversion.
  • The computational inefficiency would be an unacceptable burden for public Galaxy servers.
2 Likes