Is it possible to use have workflow ‘inputs’ that allow multiple sample selection?
e.g. First step of ‘Workflow A’ uses starSOLO, which allows selection of multiple lanes. Both R1 files (and both R2) are handled by a single run of the tool (rather than a batch job run twice).
If I set it up with ‘input datasets’ (below), upon running the pipeline I can select either a single input from a dropdown, or, hit ‘multiple datasets’ which creates a batch job. (" This is a batch mode input field. Separate jobs will be triggered for each dataset selection."). Neither of which is what I want.
If, instead, I leave the inputs out as so:
I get the ‘ugly’ full version of the inputs rather than the nice parameterised summary (its petty, but this matters to me, I’m trying to make it nice for useability), but I can select the multiple files to run correctly.
However, that has knockon effects. My “Workflow A” is essentially a subworkflow to be called by “Workflow B”. In workflow B, I cannot connect any input to the R1/R2 inputs of workflow A, so when I launch it there is no box for this input. If I hit ‘expland to full workflow form’, right now* I can’t see any further options to expland inputs for this subworkflow.
So progress. A collection input is definitely the way. I’m now struggling with allowing a paired input collection as a workflow input.
I have two pairs (lanes, one sample) of R1/R2 fastq files that I’d like to input to one run of the StarSOLO tool. They are setup as a paired collection, and if I run that directly it works. Fantastic!
So I changed the input type in my workflow too. And then created a ‘collection input’ (from the inputs menu). By default, it wouldn’t let me connect via noodle (fair, it needs a collection of pairs). So I changed the type in the collection to ‘list:paired’, and then I could noodle that into the starsolo input.
But if I configure this in my workflow as below, I get two independant starSOLO runs, one per lane.
Not sure how I can specify that the whole input collection is to be sent to one run (as per manual running) rather than split to one-pair-per-run? Any thoughts?
For this part, when changing the input type, disconnecting all the downstream noodles, then reconnecting in the order of processing will “reset” the workflow metadata. I’m not sure if that was what you did. If not yet, try that first. It solves many workflow-runtime issues.
If still a problem after reconnecting the noodles and trying a rerun, would you please post back a screenshot of the setting you want to use on the tool form? The one you want translated into the workflow? And maybe the section of that same tool form within your workflow (the side panel)? Sort of what you posted originally. These visually provide a lot of useful information
I don’t think this tool “pools” samples but let’s make sure.
Disconnecting the downstream noodles makes things clearer in the flowchart - when I try to connect the dataset pair connection I see noodles input to starSOLO, then when reconnect I can see multi-noodle output as well. (No screenshot of that part because I ran out of fingers to take it with a mouse hover). That matches the behavior of the workflow when run.
And you’re right, when run on a collection it runs per pair.
So I don’t think its possible to run starsolo with multiple pairs to one sample in a workflow? Since there’s no way to distinguish collection-into-one (this case) vs collection-into-collection (convention). I think I’ll just add an upstream concatenation step -this is a sub-workflow explicitly for one sample.