Implementing a command line tool with input and output folders as a Galaxy tool

Tillsa · July 5, 2022, 10:02am

Hi,
I want to create my own galaxy tool based on a python command line tool (READemption - A RNA-Seq Analysis Pipeline — READemption 2.0.0 documentation).
The tool is published on conda, pypi and has a docker image/container.
My question is what would be the best practice to implement this tool as a galaxy tool?
Specifically I wonder how the flow of input, intermediate results and output files would be managed.

The tool does RNA-seq analysis, is a command line tool and has various subcommands:

create: creates an input folder structure where users can put their input files (references sequences, annotation files and read files)
calling this command will create some input and output folders as follows:
READemption_analysis
├── config.json
├── input
│ ├── reads
│ ├── salmonella_annotations
│ └── salmonella_reference_sequences
└── output
└── align
├── alignments
├── index
├── processed_reads
├── reports_and_stats
│ ├── stats_data_json
│ └── version_log.txt
└── unaligned_reads

Now the user has to put their input files into the corresponding folders:
reference sequences go into READemption_analysis/input/salmonella_reference_sequences
annotation files go into READemption_analysis/input/salmonella_annotations
and reads to into READemption_analysis/input/reads

After providing the files, the user can run the first command ‘align’, which aligns the reads to the reference sequences and creates output files like statistics and BAM files. These output files are written to corresponding output folders (READemption_analysis/output/align/alignments or READemption_analysis/output/align/reports and stats)

The other subcommands perform gene quantification, create coverage files or run differential gene analysis (using DESeq2). These subcommands upon being called create their own outputfolders and write the result files into these outputfolders.

So my question is, how do I implement such a tool? How do I manage the input and output file flow? The tool actually does that by reading the corresponding input folders and for later subcommands the output folders, like the one where the bam files are saved. Can I simply write all subcommands (we use argparse) as commands in galaxy’s tool.xml file or do I have to implement our controller.py as a tool.xml or galaxy workflow? Is it possible to keep the input output folder structure concept when implementing the tool as a galaxy tool?
Any help is appreciated and if you need further information please let me know.
Best wishes,
Till

jennaj · July 5, 2022, 11:58pm

Hi @Tillsa

Use Planemo for Galaxy tool development: GitHub - galaxyproject/planemo: Command-line utilities to assist in developing Galaxy and Common Workflow Language artifacts - including tools, workflows, and training materials.. Galaxy tools are generally each responsible for a single step/function in an analysis pipeline. Tools that perform common manipulations can be reused. Parameters can be customized before or at runtime. And, if any of those steps already exist as wrapped tools, you don’t need to recreate them.

While you might need to wrap a few novel functions as tools, it sounds more like you would probably be interested in designing a Workflow (one or more) to bundle all the analysis steps to run together, along with Dataset Collections (nested/grouped data, similar in concept to a “folder”) to bundle data together as it runs through the workflow steps. Using both gains you much control over how work is processed (outputting intermediate files, or not!) plus includes reporting options and more.

Please see:

Examples of productionalize workflows can be found at Dockstore and https://workflowhub.eu/.

And Planemo can also help again with the overall development and management for more automation: Running Galaxy workflows — Planemo 0.74.11 documentation

Topic		Replies	Views
Custom galaxy tool specifying working directory tool-dev	1	1327	April 3, 2019
RNA-seq analysis (minimap2, featurecounts, deseq2) result differences	8	142	April 11, 2024
Cannot generate nested output directories and view them tool-dev	2	121	April 23, 2024
how Galaxy deals with output files in XML tool definition file	9	1444	November 18, 2019
Extract command from tool xml file galaxy-local	3	245	December 5, 2023

Implementing a command line tool with input and output folders as a Galaxy tool

Related topics