tools fail when in a workflow

I wrote five tools to process genetic data. All them were verified to work correctly out of galaxy and also as isolated tools inside galaxy. Also, the intermediate results, produced by a tool and used as an input of the next one, are ok. In short, everything works well when using the tools in isolation, but not when they are laid in sequence in a workflow. To illustrate, I show here the xml files of the first two tools.

Tool1: reads a paired collection of genetic data where the genes are identified by Gene Symbol and changes the identification to UniProt Id.
Current xml file is as below:

<tool id="gs2up" name="GS2UP: Gene Symbol to UniProt" version="0.1.0">
  <description>-</description>
  <command>
    <![CDATA[ perl $__tool_directory__/GS2UP.pl $gstcga $gstcga.name $uptcga ]]>
  </command>
  <inputs>
    <param type="data" name="gstcga" format="tabular" />
  </inputs>
  <outputs>
    <data format="tabular" name="uptcga" label="up_$(gstcga.name)" />
  </outputs>
</tool>

Tool 2: reads a paired collection of genetic data, where the genes are identified by UniProt Id and for each dataset generates three output datasets. That is to say, this tool uses a paired collection as input and generates a triplet list.
The xml file is as below:

<tool id="urgel" name="URGEL" version="0.1.0">
  <description>-</description>
  <command>
    <![CDATA[perl $__tool_directory__/URGEL.pl $tcga.forward $tcga.reverse $triplet.thrsh $triplet.diffrn $triplet.upreg ]]>
  </command>
  <inputs>
    <param name="tcga" type="data_collection" collection_type="paired" format="txt" />
  </inputs>
  <outputs>
    <collection name="triplet" type="list" label="ur_$(tcga.name)" >
      <data name="thrsh" format="txt" />
      <data name="diffrn" format="txt" />
      <data name="upreg" format="txt" />
    </collection>
  </outputs>
</tool>

As a newbie to Galaxy, I may be incurring an error. However, I can’t find it alone and would appreciate any help.

1 Like

Your first tool doesn’t work explicitly with collections.
Are you mapping a collection over it?

What version of Galaxy are you running?

1 Like

Thank you for the answer.

Galaxy version 19.05.

Yes, I map a (paired) collection when running the tool alone (not in a workflow).

My first try was with collections explicitly. I still have that “old” xml. However, the situation was the same: in isolation, the tool worked well and produced the expected results; when in a track, it failed. So I digged the documentation and found the following advice:

If a tool’s functionality can be applied to individual files in isolation, the implicit mapping described above should be sufficient and no knowledge of collections by tools should be needed. However, tools may need to process multiple files at once - in this case explicit collection consumption is required.

(from Advanced Tool Development Topics — Planemo 0.75.11 documentation)

Then I wrote this version for the tool definition. The tool itself is very simple: it reads a two-column table, where the first column is an identifier (the Gene Symbol) and the second is a numerical value, and generates a new table with the identifier translated to UniProt Id, as required by the next workflow steps. The input is a paired collection, the output is a paired collection, but (as quoted) the tool doesn’t need to be aware of that.

The second tool in sequence, when run in isolation, recognizes the paired collection created by its predecessor and also produces its expected results. But also fails when put in a workflow.

I solved it by myself. In “Edit Workflow” screen, there is a button “Inputs” to define types of input data.

Thanks a lot.

1 Like