Multi-select inputs to workflow and subworkflow

Is it possible to use have workflow ‘inputs’ that allow multiple sample selection?

e.g. First step of ‘Workflow A’ uses starSOLO, which allows selection of multiple lanes. Both R1 files (and both R2) are handled by a single run of the tool (rather than a batch job run twice).

If I set it up with ‘input datasets’ (below), upon running the pipeline I can select either a single input from a dropdown, or, hit ‘multiple datasets’ which creates a batch job. (" This is a batch mode input field. Separate jobs will be triggered for each dataset selection."). Neither of which is what I want.

If, instead, I leave the inputs out as so:

I get the ‘ugly’ full version of the inputs rather than the nice parameterised summary (its petty, but this matters to me, I’m trying to make it nice for useability), but I can select the multiple files to run correctly.

However, that has knockon effects. My “Workflow A” is essentially a subworkflow to be called by “Workflow B”. In workflow B, I cannot connect any input to the R1/R2 inputs of workflow A, so when I launch it there is no box for this input. If I hit ‘expland to full workflow form’, right now* I can’t see any further options to expland inputs for this subworkflow.

  • I thought i had previously seen this to give the ‘single dropdown’ or batch selection options(so unable to run correctly), but that might have been when I had the subworkflow A set to take an ‘input’.

I’m a fairly new galaxy user, so understand I might not be coming at this from the right angle.
What is the ~galaxy-way~ tm of structuring this kind of input?

1 Like

Using Dataset collections and possibly group tags is what you are looking for. Using Galaxy and Managing your Data.

Try using a list collection with barcodes (R1), and another list collection with reads (R2) to solve the multiple input selection in the first vs second screenshot.

And, I cross-posted this over to the IWC chat. They may reply here or there, and feel free to join the chat. You're invited to talk on Matrix

1 Like

Thank you, that may well be it! Missed that option in starSOLO. Going to give it a try.

1 Like

So progress. A collection input is definitely the way. I’m now struggling with allowing a paired input collection as a workflow input.

I have two pairs (lanes, one sample) of R1/R2 fastq files that I’d like to input to one run of the StarSOLO tool. They are setup as a paired collection, and if I run that directly it works. Fantastic!

So I changed the input type in my workflow too. And then created a ‘collection input’ (from the inputs menu). By default, it wouldn’t let me connect via noodle (fair, it needs a collection of pairs). So I changed the type in the collection to ‘list:paired’, and then I could noodle that into the starsolo input.

But if I configure this in my workflow as below, I get two independant starSOLO runs, one per lane.

Not sure how I can specify that the whole input collection is to be sent to one run (as per manual running) rather than split to one-pair-per-run? Any thoughts?

For this part, when changing the input type, disconnecting all the downstream noodles, then reconnecting in the order of processing will “reset” the workflow metadata. I’m not sure if that was what you did. If not yet, try that first. It solves many workflow-runtime issues.

If still a problem after reconnecting the noodles and trying a rerun, would you please post back a screenshot of the setting you want to use on the tool form? The one you want translated into the workflow? And maybe the section of that same tool form within your workflow (the side panel)? Sort of what you posted originally. These visually provide a lot of useful information :slight_smile:

I don’t think this tool “pools” samples but let’s make sure.

1 Like

OK, thanks. This makes sense now.

Disconnecting the downstream noodles makes things clearer in the flowchart - when I try to connect the dataset pair connection I see noodles input to starSOLO, then when reconnect I can see multi-noodle output as well. (No screenshot of that part because I ran out of fingers to take it with a mouse hover). That matches the behavior of the workflow when run.

And you’re right, when run on a collection it runs per pair.

I had been confused, because the behaviors with multiple file selection is a single run (pooled).

So I don’t think its possible to run starsolo with multiple pairs to one sample in a workflow? Since there’s no way to distinguish collection-into-one (this case) vs collection-into-collection (convention). I think I’ll just add an upstream concatenation step -this is a sub-workflow explicitly for one sample.

Thanks again for your help!

1 Like

Agree – or at least that’s how the tool is currently wrapped.

Hum. I am going to run a test on this and review with our developers. Maybe “pooled or not” can be added in as an option (later…). A few other tools have this type of toggle included.

Great workaround choice :slight_smile: Thanks for posting back the solution!