Workflow automation?


I have to fill parameters manually into my workflow which are mostly file names and sample names. I have to select from about 7 files about 20 times and enter similar but slightly different sample names (e. g. wn0552.A7CE75785, wn0552.A7CE75785_tumor, wn0552.A7CE75785_normal) for another 20 times. As you may guess, it is time consuming, error prone and in general not exactly the kind of work i dreamed of.

But i suspect i can automate this process. If it were linux command line, all i need were echo, cat, ln etc., and some rarely used ascii symbols. However in galaxy i found only cat.

So essentially i want:

  • Tool that receives one file and sends it to the output intact so i could connect its output to all steps using some specific file and set its name only once. In linux it may be cat or cp, but i would prefer ln or even variable definition to save disk space and time.

  • Tool that receives some string and output it intact like echo.

  • Tool that receives two string and concatenates them like echo $a$b . So i could set my sample name to e. g. wn0552.A7CE75785, derive other important strings from it and use these strings in all my steps using “add connection to module”.

  • And also tool that gives the number of processors on current machine to set the number of threads in some tools.

Is this possible in galaxy? Or maybe there is an easier way to automate workflows? Or i have to write wrappers for all these tools myself?

Thanks in advance.

1 Like

Please check out Using Workflow Parameters, I think it covers all of this, minus the last part. If tools can use multiple threads or cores they can consume $GALAXY_SLOTS, which is set as appropriate based on the resources the admin has configured.

1 Like

Thanks Marius! It is exactly what i needed.

Add Compose text parameter value tool to the workflow
Add two more “Components” using the “Insert components” button
Add the Regex Find And Replace

Unfortunately i can not find any of the mentioned entities.

I found “compose_text_param” tool in the toolshed. I think it should be mentioned in the article that this tool is not shipped out of the box and should be installed.

And one more question. I created “input dataset” for the reference genome, but MergeBamAlignment (unlike other tools) does not accept it as input! However it perfectly accepts the genome file from history but i have to do it for three times per run. Is there a way to make MergeBamAlignment accept the output of another step as the reference genome? Or is there another way to define the reference genome (including that for the other tools if possible)?

Hi @wormball – this tool has an option to accept a dataset in fasta format. This can be from an upstream tool or an input dataset.

You might need to disconnect all tools, then reconnect them starting from the input through the downstream tools, in the order of processing. I’m guessing that the tool had the setting originally to use a natively indexed genome, then it was changed to use a fasta from the history. Reconnecting the “noodles” resets the workflow’s metadata. If the genome is already an input dataset, you can also connect that input to multiple tools that need to incorporate it.

I undefined the data type of my reference genome input field (it was “fasta”) and it allowed me to connect it to MergeBamAlignment (despite MergeBamAlignment wants fasta as i can see). Thanks!