Is it possible to include data in a workflow

Is it possible to include data in a workflow definition.
By this I mean have a dataset in the actual workflow definition, not as something the user need to provide as an input.
The use case would be where an input to a tool is invariant (e.g. a configuration file) and would not change for all invocations of that workflow.

1 Like

Anyone got any ideas on this?

Welcome @tdudgeon !

There is a new type of “simple inputs” for workflows. It can be used to pass tool form settings to tools. The intended use is to allow small changes to be made at runtime (example: change the target database to map against) without requiring actually changing the tool settings directly within the workflow itself (possibly for multiple tools) each time some different configuration is needed but the overall workflow processing is the same.

Find it here in the workflow editor:

workflow-simple-inputs

Maybe this fits your needs?

That would solve part of the problem. But is this deployed yet? On usegalaxy.eu I don’t see it:


And it doesn’t completely address the problem if I understand it correctly as it does not allow an input (file) to be specified as part or the workflow.

1 Like

Click on the “Inputs” menu so that it expands. You’ll find the new-ish input options listed.

Correct, “data” files cannot be included in a workflow. Workflows are like a recipe – all of the “instructions” but none of the actual “ingredients”.

That said, tool wrappers can be designed to use reference data that is built-in on the server – example: reference genomes and their indexes.

  1. Place the dataset you want others to use in a history and share or publish it so that is accessible by others. Annotation in the workflow could point to the data with instructions for how to use it. Others would make a copy of that dataset, import it into their history, and then select it at workflow runtime. This would allow you to keep using a public Galaxy server.

  2. You could also create whatever other pre-set data you want to be made available and set it up to be accessible in a Data Library (similar to a Published History). OR, that pre-set data could be “indexed” on a server and accessed by a tool wrapper. For both, you’ll need to set up your own Galaxy server, be an administrator, and write or modify an existing tool wrapper. If you think what you create would be useful to others, all of it could be published to the ToolShed Galaxy | Tool Shed, including a Data Manager that would install the static data indexed correctly.

If you are interested in the second option, write back and we can point to local Galaxy and/or tool development resources, plus examples, and the development community chats/groups.

OK, so I found the “Simple inputs”, but it doesn’t seem of much use as it seems to only allows text, integer, float, boolean, colour inputs to be defined. But I guess that’s why its called simple!

Option 1 that you mention seems like the only way to go, but its a bit clunky and the user needs to know how to do this, and you end up with a copy of the data (or lots of copies of the data) which gets messy.

I was aware or reference datasets, but was informed that this was not a good solution for my use case (though I think it could be a possible solution).

Best of all would be able to provide data as part of the workflow!

1 Like

If the data were very small, that may be something the developers would consider for a future release. For larger data, shared datasets in a history, built-in as an index, or placed in library data are the ways to go. Data could also be hosted somewhere else and a URL provided (the workflow user would import the data from that remote location – this is how we set up our tutorials so that they can be used across Galaxy servers, and not be bound to a specific instance).

Note: data copied from a library does not count toward “quota” or consume extra disk space – it is just a “clone” dataset, referring back to the original.

I would suggest opening an issue ticket at the primary Galaxy repository, making your case (include examples, etc), and see what they think. Currently, workflows are small and do not contain data on purpose. But an issue/enhancement request would open the discussion up to the wider community for feedback. GitHub - galaxyproject/galaxy: Data intensive science for everyone.