how Galaxy deals with output files in XML tool definition file

Hi all,
I created a bash script that runs a tool and then it takes some of the output files put them in a directory that I’ve created for this tool and copy them into some bash variables:
cp ${output_dir}/file1.txt $2

cp ${output_dir}/file2.txt $3

Therefore in the XML tool definition file that runs this bash script I specify this outputs:

sh -e $__tool_directory__/tool.sh
      $input 
      $file1
      $file2

  <outputs>
            <data name="file1" format="txt" label="file 1"/>
            <data name="file2" format="txt" label="file 2"/>
    </outputs>

So I force Galaxy to put this scripts in a directory created by me. Is this wrong? Or should I let Galaxy put the output files in the default directory?

Hi Patri,
can you explain better what your tool should do? can you paste the tool.sh script?

This is a wrapper in bash that runs my python tool (mytool.py) that takes one input and produces two outputs. Then it puts this outputs in a folder created for this tool. And I force Galaxy to put the outputs in this folder. In fact after I run this tool in Galaxy I have all the files in this folder “/results/output-tools/MYTOOL/sessions/${session_dir}” and not in the default folder in which Galaxy stores the outputs one /database/files. Is this ok or it may cause some sort of problem since I am running a Galaxy instance with a lot of tools that I integrated in this way on a server?

#!/bin/sh

# run program
run_output=$(python mytool.py --input $1)

# get session name
session_dir=$(echo "${run_output}" | grep "Session:" | sed "s/^Session:\ //")

# folder with the session name
output_dir=`"/results/output-tools/MYTOOL/sessions/${session_dir}"`

# check if the output files exist
for f in "file1" "file2"; do
        if [ ! -e ${output_dir}/$f ]; then
                exit 1
        fi
done

cp ${output_dir}/file1.txt $2
cp ${output_dir}file2.txt $3

It looks complicated. I personally create a temp directory like:
tempfolder=$(mktemp -d /galaxy/galaxy/database/files/XXXXXX)

And then execute python like:
python mytool.py --input $1 --outputdir $tempfolder

And then move the files like:
cp ${tempfolder}/file1.txt $2

Don’t know if it is preferred, it is just to give you inspiration. Btw, don’t forget to remove your
output_dir when you are done.

EDIT:
Just read the second post better. You do have the output double now so in your own folder and in a galaxy folder.

But if I remove the folder, the output files will still show up in Galaxy?
The fact is that my tool put all the outputs in a specific dir and i need Galaxy to retrieve them in that specific folder. But as you said I have the the data in the galaxy folder and in my own folder. Maybe I should just put all the files where the script runs by moving them out of the session folder I created and let galaxy deal with them.

Yes, but you remove the folder after you moved or copied it to galaxy. So it will be the very last line of the bash script. $2 and $3 are basically paths.

Prefect many thanks!!
But if the folder is not a temporary folder but just a regular folder that i remove, is a problem?

Don’t think that is a problem

Seems that the bash wrapper only calls python (+ some post processing). In my opinion you don’t need this bash wrapper script but you should code is directly in the tool’s command block.

Like @gbbio I would suggest that you should modify the python script (assuming its your code), to just output to a configurable directory, e.g python mytool.py --input $input --output output. The reason is that a central directory like /results/output-tools/MYTOOL/sessions/ (in your example) does not exist on other systems and complicates installation.

But I completely disagree that tempfolder=$(mktemp -d /galaxy/galaxy/database/files/XXXXXX) is a good idea. First of all /galaxy/galaxy/database/files/ might not exist on all systems and more importantly database/files/ is the central location of Galaxys files, one is not supposed to write into this. Also the directory is writable only by the Galaxy system user, i.e. writing won’t work if Galaxy runs jobs as real user.

The solution is simple. Each Galaxy job already runs in its own temporary directory (called job working directory). So just create a temporary directory (e.g. mkdir outdir) in the current working dir. and use this. Galaxy will take care of cleanup after the job finished.

So the command block could look something like:`

mkdir outdir
python $__tool_directory__/mytool.py --input '$input' --output outdir && 
output_dir/file1.txt '$file1' &&
output_dir/file2.txt '$file2'

Note the use of && which ensures that all commands must be successful.

Instead of copying the output files you might also use the from_work_dir attribute of the <data> tag.

2 Likes

Thanks for the explanation!