Custom tool XML - how can I get original file name (ie. bin1.fasta) and not just dataset_###.dat from input files?

phagepower · July 24, 2020, 10:44pm

Hello,

I’m working on adding a couple custom tools to our Galaxy install, and one problem I’m running across is with the names of datasets.
More specifically, one tool we use requires all input files to be in a certain “input” directory. So, first thing my custom tool XML does is copy (using cp) the datasets from the list a user chooses as input (see below) to this folder. However, the datasets all have names such as dataset_9964.dat, dataset_9967.dat, etc. and not their names in the history/original names such as bin1.fasta, bin2.fasta, etc.

As such, when running this tool, the output (see second image below) in tabular TSV format things that all my user_genomes which I submitted to the tool are in fact called dataset_9964, dataset_9963, etc. as it uses the input file names to determine row names in output.

I’m wondering if there’s any way I can retrieve the original files’ names upon job submission.

Our input section for the tool XML is as follows:

<inputs>
    <param name="input_files" type="data" multiple="true" format="fasta" label="Input FASTA files" />
</inputs>

Input list/collection of FASTA files:

First column of output TSV, column name is “user_genome”, and each row should represent a different input file (dataset_9963 = bin1.fasta, etc.):
output

Thanks in advance!

innovate-invent · July 24, 2020, 11:55pm

The ParSNP tool has a similar requirement.

See https://github.com/brinkmanlab/galaxy-tools/blob/00868930fc05f48a702fe8357b58f004cf899238/parsnp/ParSNP.xml#L20-L27

Bascally, you don’t need to copy the files, just create symlinks in a working folder.
The name of the dataset is accessible as $input_file.element_identifier

phagepower · July 25, 2020, 12:02am

@innovate-invent - Thanks for the link, and exact lines of code!
So if I understand correctly, this line below is not only going through an array of paths, but rather an array of objects in memory, and then we can look into each object’s parameters, such as $genome.element_identifier?
I come from a background of lots of object-oriented programming and am trying to piece this together in the context of Galaxy tools.

    #for $genome in $genomes

innovate-invent · July 25, 2020, 12:44am

Yes, this link may help: https://pythonhosted.org/Cheetah/users_guide/language.html

An input with multiple=true or a collection is handed to the template as an iterable of “dataset” objects. These objects when cast to a string return the dataset path. They also possess other useful properties that describe the dataset, including tags and metadata.

If you want a peek into the data structures provided to the template you can install this tool: https://github.com/brinkmanlab/galaxy-tools/blob/master/inspect/inspect.xml
Don’t install this tool on a public server as it reveals all sorts of internal information like api keys and whatnot.

phagepower · July 25, 2020, 1:05am

I’ll have a look at the Cheetah user/language guide, thanks. Also, the code you linked previously for ParSNP worked exactly as desired! Cheers.

phagepower · July 25, 2020, 1:06pm

@innovate-invent - have you ever used the Inspect tool? I set it up but keep getting an error when submitting a job, which I can’t seem to narrow down where it’s coming from:

galaxy.jobs.runners ERROR 2020-07-25 06:01:07,013 [p:22000,w:1,m:0] [SlurmRunner.work_thread-0] (3193) Failure preparing job
Traceback (most recent call last):
  File "lib/galaxy/jobs/runners/__init__.py", line 236, in prepare_job
    job_wrapper.prepare()
  File "lib/galaxy/jobs/__init__.py", line 1085, in prepare
    self.command_line, self.extra_filenames, self.environment_variables = tool_evaluator.build()
  File "lib/galaxy/tools/evaluation.py", line 462, in build
    raise e
AttributeError: 'DatasetListWrapper' object has no attribute 'input'

innovate-invent · July 25, 2020, 5:58pm

I have used it, but it doesn’t surprise me that it would fail on a newer version of galaxy.
It does all sorts of things that the galaxy devs never anticipated.
The last line of the error is the most revealing, somewhere in the tool template code an input property is trying to be accessed.
Unfortunately the tool template code in galaxy continues to not output debug information within the template itself.
The error is with the DatasetListWrapper which tells me that the multidata section is having trouble.
You can try commenting out or removing the multidata section. This could possibly be a bug in Galaxy that my tool is exposing.

phagepower · July 25, 2020, 6:07pm

Yup sure enough that fixed it. Closest bug report I could find was this one (https://github.com/galaxyproject/galaxy/pull/6317) which was fixed in June 2018, although there the attribute was ‘value’ and not ‘input’.

That’s a lot of output data to go through, I do see the element_identifier too, this will be useful for future tool dev thank you!

Topic		Replies	Views
Download Multiple Files -- Custom tool development: Writing outputs to the user history as datasets tool-dev , planemo	1	677	April 23, 2020
galaxy input file name changed	0	369	July 3, 2020
Creating tool which requires specific input file extension server-admin , tool-dev	2	446	September 14, 2022
Tool output associated with incorrect input file name? usegalaxy.org support workflow , tags	1	640	June 18, 2019
Getting files from history download with proper sample names usegalaxy.eu support history , collections , __apply_rules__	4	511	December 7, 2022

Custom tool XML - how can I get original file name (ie. bin1.fasta) and not just dataset_###.dat from input files?

Related topics