Jobs outputs not collected.

Dear All,
I deployed Galaxy locally using sh run.sh on Ubuntu 16.04 LTS. My tool was dockerized and set to run locally through Galaxy interface. The tool ran ok as I saw files generated. But Galaxy interface couldn’t list files in the webpage. Here is my Galaxy logs copied from my terminal (at galaxy logs)

My job_config.xml is like

<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is
     configured by default (if there is no explicit config). -->
<job_conf>
    <plugins>
        <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="1"/>
    </plugins>
    <destinations default="local">
        <destination id="local" runner="local"/>
        <destination id="docker_local" runner="local">
          <param id="docker_enabled">true</param>
        </destination>

        <destination id="metaboflow-container" runner="local">
           <param id="docker_enabled">true</param>
           <param id="docker_volumes">$defaults,/opt/galaxy_data:rw</param> -->
           <param id="docker_sudo">false</param>
           <param id="docker_auto_rm">false</param>
        </destination>

    </destinations>
    <tools>

     <tool id="metaboflow" destination="metaboflow-container"/>
    </tools>
</job_conf>

my tool xml is

<tool id="metaboflow" name="Metabo Flow (in docker)">
    <description>Metabo Flow</description>
    <requirements>
      <container type="docker">jianlianggao/metaboflow_pre:20203</container>
    </requirements>
    <command><![CDATA[
      /usr/local/bin/runPreProc.R -d ${testdata_input} -p ${pre_define_sd} -o \$PWD
     ]]>
    </command>
    <inputs>
       <param name="testdata_input"  type="data" label="Test Dataset" help="Demo data set (Dataset_Metaboflow_Ionomic_Workflow.csv) can be downloaded from the link below." />
        <param name="pre_define_sd"  type="data"  label="Pre-defined standard deviation matrix" />
    </inputs>
    <outputs>
      <!--  <collection name="metaboflow_output_plots" type="list" label="MetaboFlow output plots from ${on_string}">

             <discover_datasets pattern="(?P&lt;name&gt;)\.pdf$" ext="pdf" />

                 </collection> -->
         <data name="data_long" format="csv" /> <!--  from_work_dir="data_long.csv" label="data_long.csv"/>
          <data name="output2" file="/opt/galaxy_data/data_wide.csv" label="data.wide.csv"/>
         <data name="output3" file="/opt/galaxy_data/data_wide_Symb.csv" label="data.wide_Symb.csv"/>
         <data name="data4" file="/opt/galaxy_data/stats.median_batch_corrected_data.txt" label="stats.median_batch_corrected_data.txt"/>
         <data name="data5" file="/opt/galaxy_data/stats.outliers.txt" label="stats.outliers.txt"/>
         <data name="data6" file="/opt/galaxy_data/stats.raw_data.txt" label="stats.raw_data.txt"/>
         <data name="data7" file="/opt/galaxy_data/stats.standardised_data.txt" label="stats.standardised_data.txt"/>
         <data name="data8" file="/opt/galaxy_data/plot.logConcentration_by_batch.pdf" label="plot.logConcentration_by_batch.pdf"/>
         <data name="data9" file="/opt/galaxy_data/plot.logConcentration_z_scores.pdf" label="plot.logConcentration_z_scores.pdf"/> -->
    </outputs>
    <tests>
      <test>
         <param name="testdata_input" value="Dataset_Metaboflow_Ionomic_Workflow.csv"/>
         <param name="pre_define_sd" value="pre_defined_sd.txt"/>
         <output name="data_long" file="data_long.csv"/>
      </test>
    </tests>
    <help>
    </help>
</tool>

Look forward to getting your comments to help me sort out my issue. Thank you very much.

Best regards,
Jianliang

1 Like

Besides I can not give a full answer and only some small hints now, it does not look oke to me.
This line for example:

<data name="data_long" format="csv" />

You can see this line as “the output path”. The variable $data_long contains a path and filename where galaxy is expecting the output. But I dont see that you give this variable to the
runPreProc.R script. So how can your script know how to give the output to galaxy?

I think the answer is or use from_work_dir what you commented out now. Or change and execute the script like:

/usr/local/bin/runPreProc.R -d ${testdata_input} -p ${pre_define_sd} -o \$PWD -outputfile $data_long

To use from_work_dir I am guessing your code should look like something like this:

/usr/local/bin/runPreProc.R -d ${testdata_input} -p ${pre_define_sd} -o "output"
<data name="data_long" format="csv"  from_work_dir="output/data_long.csv" label="data_long"/>

Where your runPreProc.R script writes all the output files to a folder named “output”. You need to create this folder first.

It can be possible that something like:

/usr/local/bin/runPreProc.R -d ${testdata_input} -p ${pre_define_sd} -o ""
<data name="data_long" format="csv"  from_work_dir="data_long.csv" label="data_long"/>

also works but I don’t know how you script looks like.

1 Like

Hi Marten,

Thank you very much for your comments.
My script only takes 3 parameters, they are immediately after the switches -d -p -o. The files are generated in the R script. It doesn’t take $data_long (from output name). It doesn’t make sense for me to have the parameter in command line. As I commented out in the tool xml file, I will have more files to collect. Should I put all the variable names in the command line?

Best regards,
Jianliang

I see it are quite a lot of files and I would use the from_work_dir method.

So

Should I put all the variable names in the command line?

No

You can reply if it is still not clear, then me or some else can explain it better or in more detail. I understand that my first answer is a bit cryptic at the moment =)

Hi Marten,

Thank you very much. I still got errors as my R code couldn’t recognize the subfolder “output”. Should I create the subfolder in my tool xml or my job_conf.xml before R script is run?

Best regards,
Jianliang

My first question would be, did you tried it without that extra output folder. So doing something like -o "" when you execute the script.

If you need or want that folder I think in your case you could do.

<![CDATA[
      mkdir output && /usr/local/bin/runPreProc.R -d ${testdata_input} -p ${pre_define_sd} -o output
     ]]>

So in your tool xml or in the R scripts itself

1 Like

Hi Marten,

Thank you very much. I will give a go later today. By the way, there is another potential issue. I notice that after the docker image is run, the docker daemon kill the docker. Galaxy may not be able to access the files because they exists in the docker. Maybe I need a persistent volume to keep the files for Galaxy webpage to access? I am not sure, I am very new to Galaxy.

Best regards,
Jianliang

Ah, I am also not sure. Did not used docker in combination with galaxy yet. Hopefully some one can help you with that part.

EDIT:

I guess it should work:

Hi Marten,

Thank you very much. My tool xml used the same tags as configuration for docker container. I added folder create function in my R script and switch off messages when loading dependent packages. Now it works. Hooray!!!

Best regards,
Jianliang

2 Likes