Multi-select inputs to workflow and subworkflow

swbioinf · April 28, 2023, 5:32am

Is it possible to use have workflow ‘inputs’ that allow multiple sample selection?

e.g. First step of ‘Workflow A’ uses starSOLO, which allows selection of multiple lanes. Both R1 files (and both R2) are handled by a single run of the tool (rather than a batch job run twice).

If I set it up with ‘input datasets’ (below), upon running the pipeline I can select either a single input from a dropdown, or, hit ‘multiple datasets’ which creates a batch job. (" This is a batch mode input field. Separate jobs will be triggered for each dataset selection."). Neither of which is what I want.

If, instead, I leave the inputs out as so:

I get the ‘ugly’ full version of the inputs rather than the nice parameterised summary (its petty, but this matters to me, I’m trying to make it nice for useability), but I can select the multiple files to run correctly.

However, that has knockon effects. My “Workflow A” is essentially a subworkflow to be called by “Workflow B”. In workflow B, I cannot connect any input to the R1/R2 inputs of workflow A, so when I launch it there is no box for this input. If I hit ‘expland to full workflow form’, right now* I can’t see any further options to expland inputs for this subworkflow.

I thought i had previously seen this to give the ‘single dropdown’ or batch selection options(so unable to run correctly), but that might have been when I had the subworkflow A set to take an ‘input’.

I’m a fairly new galaxy user, so understand I might not be coming at this from the right angle.
What is the ~galaxy-way~ tm of structuring this kind of input?

jennaj · May 1, 2023, 9:40pm

Using Dataset collections and possibly group tags is what you are looking for. Using Galaxy and Managing your Data.

Try using a list collection with barcodes (R1), and another list collection with reads (R2) to solve the multiple input selection in the first vs second screenshot.

And, I cross-posted this over to the IWC chat. They may reply here or there, and feel free to join the chat. You're invited to talk on Matrix

swbioinf · May 12, 2023, 4:46am

Thank you, that may well be it! Missed that option in starSOLO. Going to give it a try.

swbioinf · May 19, 2023, 6:46am

So progress. A collection input is definitely the way. I’m now struggling with allowing a paired input collection as a workflow input.

I have two pairs (lanes, one sample) of R1/R2 fastq files that I’d like to input to one run of the StarSOLO tool. They are setup as a paired collection, and if I run that directly it works. Fantastic!

So I changed the input type in my workflow too. And then created a ‘collection input’ (from the inputs menu). By default, it wouldn’t let me connect via noodle (fair, it needs a collection of pairs). So I changed the type in the collection to ‘list:paired’, and then I could noodle that into the starsolo input.

But if I configure this in my workflow as below, I get two independant starSOLO runs, one per lane.

Not sure how I can specify that the whole input collection is to be sent to one run (as per manual running) rather than split to one-pair-per-run? Any thoughts?

jennaj · May 22, 2023, 1:27am

For this part, when changing the input type, disconnecting all the downstream noodles, then reconnecting in the order of processing will “reset” the workflow metadata. I’m not sure if that was what you did. If not yet, try that first. It solves many workflow-runtime issues.

If still a problem after reconnecting the noodles and trying a rerun, would you please post back a screenshot of the setting you want to use on the tool form? The one you want translated into the workflow? And maybe the section of that same tool form within your workflow (the side panel)? Sort of what you posted originally. These visually provide a lot of useful information

I don’t think this tool “pools” samples but let’s make sure.

swbioinf · May 25, 2023, 11:50pm

OK, thanks. This makes sense now.

Disconnecting the downstream noodles makes things clearer in the flowchart - when I try to connect the dataset pair connection I see noodles input to starSOLO, then when reconnect I can see multi-noodle output as well. (No screenshot of that part because I ran out of fingers to take it with a mouse hover). That matches the behavior of the workflow when run.

And you’re right, when run on a collection it runs per pair.

I had been confused, because the behaviors with multiple file selection is a single run (pooled).

So I don’t think its possible to run starsolo with multiple pairs to one sample in a workflow? Since there’s no way to distinguish collection-into-one (this case) vs collection-into-collection (convention). I think I’ll just add an upstream concatenation step -this is a sub-workflow explicitly for one sample.

Thanks again for your help!

jennaj · May 26, 2023, 12:04am

Agree – or at least that’s how the tool is currently wrapped.

Hum. I am going to run a test on this and review with our developers. Maybe “pooled or not” can be added in as an option (later…). A few other tools have this type of toggle included.

Great workaround choice Thanks for posting back the solution!

Topic		Replies	Views
featureCounts multiple input files in workflow usegalaxy.org support workflow	5	42	November 27, 2024
Imput dataset node in workflow fails to recognize multiple files unless they are in collections usegalaxy.org support workflow	14	68	July 3, 2024
Workflow is not operating on a collection usegalaxy.org support workflow	9	1881	March 20, 2019
Param showing data output from history, but not dataset (ie multiple outputs) nor collections tool-dev	7	762	December 17, 2018
There was a problem in creating a paired read collection using API usegalaxy.org support workflow , api	17	297	February 13, 2024

Multi-select inputs to workflow and subworkflow

Related topics