How to get Tool input data's directory path ?

Hi,

As I started the Topic here, I have been redirected to Galaxy Help.

In short, I need to know if it’s possible to inject in the command tag, the directory path used for tool input !

With $corpus I get the full path including processed file name.
Something like $corpus.directory (without file name at the end) would be great :slight_smile:

The error triggered by my Tool => Caused by: java.lang.IllegalArgumentException: Not a directory: [/home/user/projects/galaxy/database/files/000/dataset_82.dat]

PS: Galaxy release 19.05

Thanks

1 Like

Did this help from Bernt not resolve how to do this? I don’t think that inputting a complete directory will meet our coding standards (and functionality expectations), so might introduce other problems if attempted. https://github.com/galaxyproject/galaxy/issues/8043#issuecomment-496300344

Without seeing the sources we won’t be able to help efficiently.

1 Like

Hi there, and thanks for your feedback.

Here is the Tool sources => termsuite

And the tool xml definition:

<tool id='termsuite' name='Termsuite' version='3.0.10'>
  <description>Termsuite</description>
  <command>java -cp $__tool_directory__/termsuite-core-3.0.10.jar fr.univnantes.termsuite.tools.TerminologyExtractorCLI 
                -t $__tool_directory__/treetagger 
                -c $corpus 
                -l $lng 
                --json $outfile
  </command>
  <inputs>
    <param name="corpus" type="data" format="text" label="Input corpus"/>
    <param name="lng" type="select" label="Corpus language">
        <option value="en">EN</option>
        <option value="fr">FR</option>
    </param>
  </inputs>
  <outputs>
    <data name='outfile' type="data" format='json' label="Output terminology"/>
  </outputs>
</tool>

Not really unfortunately, I don’t see how to mix cheetah with my java command…
It seems a bit complex from what I have red in the Documentation.

I understand the constraint about meeting Galaxy coding standards, but we only need a Path info (not modifying galaxy way of working…).

I’ve seen parameter like get_working_directory or $output.extra_path_file, wouldn’t it be possible to have it this way ?

I gave this answer ones to someone else and maybe it can also apply for you. Move your java command to a bash file. In the bash file create a temporary folder, move the input file to that folder and execute the java command. Then move the output files to the galaxy output location variables en remove the temp folder.

2 Likes

Seems that the program expects some directory as input and not a file.

Possible workaround

mkdir input
ln -s $corpus input/corpus

then use input instead of the $corpus as parameter value for the program.

2 Likes

Hi,

Thanks for your feebacks.

As expected according to Docs, we have no other choice but to use a wrapper in order to be able to give a dir path param to the tool (a Directory input type or a property such as $input.directory_path would be nice to add to Galaxy).

So as @gbbio proposed, we’ll have to add in a bash file which interprets the arguments and plays around with $input’s path => It seems do-able and I’ll give it a shot and tell what…

Also, I was wondering about

This is a batch mode input field. Separate jobs will be triggered for each dataset selection.

With symlinking processed files (say 3 files) to my temp folder,my tool will process redudantly 2 files among 3 (as all the files foud in the temp folder are processed by the Tool) , whose purpose is to process many files at a time to generate a whole Terminology…

Thanks.

This workaround is not a good one (or I don’t fully understand it and corrent me if I am wrong). If you use the input variable or a fixed “input” name to create the folder is will go wrong if the user executes the tool twice while the first time is still running.

Lets say you run the tool twice at the same time with the same input but with a different language setting. Then the temp files of both processes will be written to the same folder, and it can be that those files even have the same file name. So the output of the different parameters will be mixed together. You need something that gives the folder a unique name per process.

Its safe. Each job gets its own job directory.

1 Like