I am working on creating some standard workflows for common data processing pipelines that we run in my bioinformatics core facility, and am hoping to keep the “run workflow” forms as simple as possible for our users. The default hiding behavior when all inputs are filled has already streamlined things quite a bit, I am finding the simplicity of the workflow “input” tools a bit confining. I prefer the mechanics offered in the tool building XML specifications, for example using
<conditional> tags to allow users to simply select paired or single end input data or offering dropdowns for selecting genome index based on dbkeys.
Because of this, I was considering using a custom tool to handle all the inputs for the workflow, which would require the tool to simply pass input collections through to the tool’s output. So far, I have been able to build a tool that maps over input collections using
structured_like output tags and copies the individual datasets into the output collection. However, this approach has the drawback that all input data is duplicated. I also tested symlinking the dat files directly to avoid copying, but this appears to still increase quota usage, and I worry about users breaking the links by deleting the original datasets.
An ideal tool would copy input collections to a new output in a similar way to collection operation tools like Build List, without increasing quota usage. Is there any way to accomplish this with Python, Cheetah or the tool definition XML? I see that the built in collection operations call methods in the
galaxy.tools module using
<type> tags in a way that doesn’t seem intended for general use.
I also welcome suggestions for alternate approaches. Thanks!