Wrapper: output file depending on input field, how to "make the link"

Hello,

We developped vvv2_display tool for virus variant calling and the Galaxy wrapper.

We add a new option to get a vcf additional output file. Therefore this output file must be created only if an option with the filename is provided (`-x snp_summary.vcf` for instance).

In the wrapper, I created an optional input with a if (`option_snp_vcf_summary|boolean`), a conditional vcf output (with `filter` to make it active or not):

  • text in input (`option_snp_vcf_summary|snp_vcf_summary_f`)
  • vcf in ouput (`option_snp_vcf_summary|snp_vcf_summary_f`)

, but I obtain only an empty vcf output file while when the program is ran in command line, I get the expected vcf output file.

Here is my wrapper code:

<tool id="vvv2_display" name="vvv2_display: Display SNP proportions and CDS of an assembly in png image" version="0.2.5.0" python_template_version="3.9">
    <requirements>
      <requirement type="package" version="0.2.5.0">vvv2_display</requirement>	      
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
       vvv2_display.py -f '$vadr_fail_annotation' -p '$vadr_pass_annotation' -s '$vadr_seqstat' -n '$vardict_vcf' -r '$snp_img' -w '$var_significant_thres' -o '$cov_depth' -e '$cov_depth_corr' -t '$snp_loc' -u '$snp_loc_summary' -j '$json_annot' -k '$bed_annot' -l '$correct_vcf' -m '$contig_limits' -N '$contig_names' $cov_depth_scale 
       ## vcf significant output file option
       #if str($option_snp_vcf_summary.boolean) == '2':
           -x '${option_snp_vcf_summary.snp_vcf_summary_f}'
       #end if
    ]]></command>
    <inputs>
        <param type="data" name="vadr_fail_annotation" format="tabular" />
        <param type="data" name="vadr_pass_annotation" format="tabular" />
        <param type="data" name="vadr_seqstat" format="txt" />
        <param type="data" name="vardict_vcf" format="vcf" />
        <param type="integer" name="var_significant_thres" value="7" min="0" max="100" label="min int threshold to keep significant variant (%age)" format="int" />
        <param type="select" name="cov_depth_scale" label="tells if cov depth scale (ordinate) is displayed with log10 (default) or linear scale" help="tells if cov depth scale (ordinate) is displayed with log10 (default) or linear scale">
          <option value="">log10 scale</option>
			    <option value="-y">linear scale</option>
        </param>
        <param type="data" name="cov_depth" format="txt" />
        <conditional name="option_snp_vcf_summary">
          <param type="select" name="boolean" label="tells if vcf summary output file is created" help="tells if vcf summary output file is created">
            <option value="1">no summary vcf output</option>
		  	    <option value="2">additional vcf summary output</option>
          </param>
          <when value="1"/>
          <when value="2">
            <param type="text" name="snp_vcf_summary_f" format="vcf" argument="-x" help="provide the vcf summary output file name"/>
          </when>
        </conditional>  
    </inputs>
    <outputs>
        <data name="snp_img" format="png" label="png image showing variants"/>
        <data name="snp_loc_summary" format="tabular" label="tsv file of significant variants only"/>
        <!-- present only if ask for summary vcf output-->
        <data name="option_snp_vcf_summary|snp_vcf_summary_f" format="vcf" label="vcf file of significant variants only" from_work_dir="option_snp_vcf_summary|snp_vcf_summary_f">
          <filter>option_snp_vcf_summary['boolean'] == "2"</filter>
        </data>
	<!-- intermediate output files added for Galaxy compatibility -->
        <data name="snp_loc" format="txt" hidden="true" label="tsv file of all variants"/>
        <data name="json_annot" format="json" hidden="true" label="json file of annotations"/>
        <data name="bed_annot" format="bed" hidden="true" label="bed file of of annotations"/>
        <data name="correct_vcf" format="vcf" hidden="true" label="vcf file of variants corrected for positions when multi contigs"/>
        <data name="contig_limits" format="txt" hidden="true" label="txt file with contig limits"/>
        <data name="contig_names" format="txt" hidden="true" label="txt file with contig names"/>
        <data name="cov_depth_corr" format="txt" hidden="true" label="txt file with pos and cov depth, pos corrected when multi contigs"/>
	<!-- end intermediate output files added for Galaxy compatibility -->		
    </outputs>
    <tests>
      <test>
            <param name="vadr_fail_annotation" value="test_vvv2_display/res_vadr_fail.tbl" ftype="tabular"/>	
            <param name="vadr_pass_annotation" value="test_vvv2_display/res_vadr_pass.tbl" ftype="tabular"/>
            <param name="vadr_seqstat" value="test_vvv2_display/res_vadr.seqstat" ftype="txt"/>
            <param name="vardict_vcf" value="test_vvv2_display/res_vardict.vcf" ftype="vcf"/>
            <param name="var_significant_thres" value="7"/>
            <param name="cov_depth_scale" value="-y" />
            <param name="cov_depth" value="test_vvv2_display/res_covdepth.txt" />
            <param name="option_snp_vcf_summary|boolean" value="2"/>
            <output name="snp_img" file="test_vvv2_display/res_snp.png" ftype="png"/>	    
            <output name="snp_loc_summary" file="test_vvv2_display/res_snp_summary.tsv" ftype="tabular"/>
            <!-- <output name="option_snp_vcf_summary|snp_vcf_summary_f" file="test_vvv2_display/res_snp_summary.vcf" ftype="vcf"/> -->
	          <output name="option_snp_vcf_summary|snp_vcf_summary_f" file="test_vvv2_display/res_snp_summary.vcf" ftype="vcf"/>
	          <!-- intermediate output files -->
            <output name="snp_loc" file="test_vvv2_display/res_snp.txt" ftype="txt"/>
	          <output name="json_annot" file="test_vvv2_display/res_vadr.json" ftype="json"/>
            <output name="bed_annot" file="test_vvv2_display/res_vadr.4vardict.bed" ftype="bed"/>
            <output name="correct_vcf" file="test_vvv2_display/res_correct.vcf" ftype="vcf"/>
            <output name="contig_limits" file="test_vvv2_display/contig_limits.txt" ftype="txt"/>
            <output name="contig_names" file="test_vvv2_display/contig_names.txt" ftype="txt"/>
            <output name="cov_depth_corr" file="test_vvv2_display/res_covdepth_corrected.txt" ftype="txt"/>
	    <!-- end intermediate output files -->	    
      </test>
    </tests>
    <help><![CDATA[
[vvv2_display.py]
Aim: Display of SNP proportions, annotations, for an assembly
in:
- vardict variant calling output
- vadr assembly annotations
out:
- png file (image of SNP proportion alongside the assembly with CDS positions)
- txt file with variant calling summary, location in CDS and surround DNA sequence.

usage: vvv2_display.py [-h] [-p FILE] [-f FILE] [-s FILE] [-n FILE] [-r FILE]

optional arguments:
  -h, --help            show this help message and exit
  -p FILE, --pass_tbl_f FILE
                        in: tabular file of vadr annotations, with pass status
  -f FILE, --fail_tbl_f FILE
                        in: tabular file of vadr annotations, with fail status
  -s FILE, --seq_stat_f FILE
                        in: seq stat file of vadr annotator
  -n FILE, --vcf_f FILE
                        in: vcf variant file provided by vardict
  -r FILE, --png_var_f FILE
                        out: png file with variant proportions and annotations
  -o FILE, --cov_depth_f FILE
                        [optional] in: text file of coverage depths (given by samtools depth)
    ]]></help>
    <citations>
        <citation type="bibtex">
@misc{githubvvv2_display,
  author = {Touzain, Fabrice},
  year = {2022},
  title = {vvv2_display},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/ANSES-Ploufragan/vvv2_display},
	}</citation>
      <citation type="doi">10.1016/j.virusres.2020.198201</citation>
    </citations>
</tool>

May you explain me how to get the file name in input `option_snp_vcf_summary|snp_vcf_summary_f` and create the output vcf file with this name (or a Galaxy name, it does not matter for me), please?

Many thanks for your help.

Fabrice

Welcome @FTouzain

Hopefully we can help! Let’s break it down and then you can explain if I am misunderstanding something or can help more. :slight_smile:

For this part

The additional input can be an optional input.

This optional input could be 1) never required, or 2) only required when another choice is made, or 3) the wrapper automatically interprets the presence/absence of that optional input and does something different depending on the state at runtime. We have examples of all of these, so please ask if you cannot find an example existing wrapper and we can help to find it!

The first item I noticed is the 1/2 notation for the boolean here.

    <conditional name="option_snp_vcf_summary">
      <param type="select" name="boolean" label="tells if vcf summary output file is created" help="tells if vcf summary output file is created">
        <option value="1">no summary vcf output</option>
	  	    <option value="2">additional vcf summary output</option>

For any (most?) “boolean” value, you’ll need to use TRUE/FALSE instead. Some examples are in the IUC’s tool wrappers here.

Are you using Planemo for your development project yet? If not, I would start there, then see what it reports. The framework will help with all steps to get your wrapper published later on, too.

This section about conditional parameters seems relevant but you can double check me by searching the full docs with your keyword “bool”.

This is likely what you’ll want to do, too!

        <conditional name="algorithm">
            <param name="set_algorithm_params" type="boolean" label="Set Algorithm Parameters">
            </param>
            <when value="true">
                <param argument="-k" label="minimum seed length" type="integer" value="19" />
            </when>
            <when value="false">
            </when>
        </conditional>

Then, for the output renaming, I’m not sure I understand this well yet. Could you explain more about this part?

You can give the output a consistent name in your <outputs> section. Setting up the filter to use your TRUE/FALSE in the filter instead of the “2”. If you are not sure about all the options, this is a good place to start:

I can think of one tool.xml that has optional inputs, and then optional outputs depending on the presence/absence of those optional inputs, that I think should cover what you are trying to do. Or, it may at least help us to clarify! Would you like to review? The macros have many conditional clauses that might be what you want to use too (instead of a boolean at all)!


Please review and we can follow up more. Or if you get this to work, please let us know! :slight_smile:

Dear Jennifer,
Thank you very much for your answer and sources for improvement.
Here is the corrected wrapper part for boolean usage:

<conditional name="option_snp_vcf_summary">
          <param type="boolean" name="summary_output_vcf_request" label="tells if vcf summary output file is created" help="tells if vcf summary output file is created">
          </param>
          <when value="false">
          </when>
          <when value="true">
            <param argument="-x" label="vcf summary output file name" type="text" name="snp_vcf_summary_f" format="vcf" help="provide the vcf summary output file name" optional="true"/>
          </when> 
 </conditional>

Thank you, it simplifies the code a little bit.

I use planemo lint to check syntax and wrapper structure.

more complete planemo features were not working for me on my computer. In addition, the first time I tried planemo, it was really annoying for the README part of the wrapper (this bug seems to have been solved since).

  • Explanation on the snp_summary_vcf_f input (text) and output (vcf)
    The vvv2_display.py script takes as optional argument -x ‘snp_summary_file.vcf’, where the user provide the name of the vcf output file he wants. The script use this name to create an output file and write into vcf results.

My expect is to get the file provided by the script in the wrapper output. Currently when the user check the box to get this output in the interface, he get a text field where he can set the file name. But when the script is run, the user obtains only an empty vcf file (it means galaxy does not make the link between “output file name” provided in the input (with `-x` argument) and the real vcf file returned by the python script and having this name. This is the reason why I set `from_work_dir` instruction in the output following code:

<data name="option_snp_vcf_summary|snp_vcf_summary_f" format="vcf" label="vcf file of significant variants only" from_work_dir="option_snp_vcf_summary|snp_vcf_summary_f">
          <filter>option_snp_vcf_summary[summary_output_vcf_request]</filter>       
</data>
…

and test code:

<param name="option_snp_vcf_summary|summary_output_vcf_request" value="true"/>
<output name="option_snp_vcf_summary|snp_vcf_summary_f" file="test_vvv2_display/res_snp_summary.vcf" ftype="vcf"/>

).
I apologize, I find macros quite complicated and hoped Galaxy had simpler way to obtain the result by taking file name as a string in input and getting the script output with this file name.

I tried to change input “text” field by “data” field to make Galaxy getting the file output, but the wrapper displays a combo box to choose the file I want from my history. I don’t really success to find an output name deduced from an input text file (argument) in the xml macros and even don’t really know how to use macro, it seems soooo complicated with so many line and conditional outputs.

Huge thanks for your help.
Fabrice

Do you have the code of the tool in some github repo, ideally a pull request? Commenting there would be much more convenient.

First comments:

  • instead of the conditional, just use a boolean that will decide about generating / not.
  • just hardcode the file name in the command block, e.g. -x ‘snp_vcf.summary’, and use the same in from_work_dir="snp_vcf.summar"
  • the name of the outout must not contain | but be a valid cheetah placeholder. planemo lint should complain about it.
  • booleans do not have options subtags (they just evaluate to true/false in the command block)
2 Likes

Huge thank. It solved my problem.

My boolean error was not reported by planemo (but version 0.62.1).

I followed your advices (simplyfing with one boolean and no conditional) and everything works fine now.
Here is the corrected code:

<tool id="vvv2_display" name="vvv2_display: Display SNP proportions and CDS of an assembly in png image" version="0.2.5.0" python_template_version="3.9">
    <requirements>
      <requirement type="package" version="0.2.5.0">vvv2_display</requirement>	      
    </requirements>
    <command detect_errors="exit_code"><![CDATA[
       vvv2_display.py -f '$vadr_fail_annotation' -p '$vadr_pass_annotation' -s '$vadr_seqstat' -n '$vardict_vcf' -r '$snp_img' -w '$var_significant_thres' -o '$cov_depth' -e '$cov_depth_corr' -t '$snp_loc' -u '$snp_loc_summary' -j '$json_annot' -k '$bed_annot' -l '$correct_vcf' -m '$contig_limits' -N '$contig_names' $cov_depth_scale 
       ## vcf significant output file option
       #if $summary_output_vcf_request: 
           -x 'res_snp_summary.vcf'
       #end if
    ]]></command>
    <inputs>
        <param type="data" name="vadr_fail_annotation" format="tabular" />
        <param type="data" name="vadr_pass_annotation" format="tabular" />
        <param type="data" name="vadr_seqstat" format="txt" />
        <param type="data" name="vardict_vcf" format="vcf" />
        <param type="integer" name="var_significant_thres" value="7" min="0" max="100" label="min int threshold to keep significant variant (%age)" format="int" />
        <param type="select" name="cov_depth_scale" label="tells if cov depth scale (ordinate) is displayed with log10 (default) or linear scale" help="tells if cov depth scale (ordinate) is displayed with log10 (default) or linear scale">
          <option value="">log10 scale</option>
			    <option value="-y">linear scale</option>
        </param>
        <param type="data" name="cov_depth" format="txt" />
        <param type="boolean" name="summary_output_vcf_request" label="tells if vcf summary output file is created" help="tells if vcf summary output file is created">
        </param>
    </inputs>
    <outputs>
        <data name="snp_img" format="png" label="png image showing variants"/>
        <data name="snp_loc_summary" format="tabular" label="tsv file of significant variants only"/>
	      <!-- intermediate output files added for Galaxy compatibility -->
        <data name="snp_loc" format="txt" hidden="true" label="tsv file of all variants"/>
        <data name="json_annot" format="json" hidden="true" label="json file of annotations"/>
        <data name="bed_annot" format="bed" hidden="true" label="bed file of of annotations"/>
        <data name="correct_vcf" format="vcf" hidden="true" label="vcf file of variants corrected for positions when multi contigs"/>
        <data name="contig_limits" format="txt" hidden="true" label="txt file with contig limits"/>
        <data name="contig_names" format="txt" hidden="true" label="txt file with contig names"/>
        <data name="cov_depth_corr" format="txt" hidden="true" label="txt file with pos and cov depth, pos corrected when multi contigs"/>
        <!-- present only if ask for summary vcf output-->
        <data name="snp_vcf_summary_f" format="vcf" label="vcf file of significant variants only" from_work_dir="res_snp_summary.vcf">
          <filter>summary_output_vcf_request</filter>       
        </data>
	<!-- end intermediate output files added for Galaxy compatibility -->		
    </outputs>
    <tests>
      <test>
            <param name="vadr_fail_annotation" value="test_vvv2_display/res_vadr_fail.tbl" ftype="tabular"/>	
            <param name="vadr_pass_annotation" value="test_vvv2_display/res_vadr_pass.tbl" ftype="tabular"/>
            <param name="vadr_seqstat" value="test_vvv2_display/res_vadr.seqstat" ftype="txt"/>
            <param name="vardict_vcf" value="test_vvv2_display/res_vardict.vcf" ftype="vcf"/>
            <param name="var_significant_thres" value="7"/>
            <param name="cov_depth_scale" value="-y" />
            <param name="cov_depth" value="test_vvv2_display/res_covdepth.txt" />
            <param name="summary_output_vcf_request" value="true"/>
            <output name="snp_img" file="test_vvv2_display/res_snp.png" ftype="png"/>	    
            <output name="snp_loc_summary" file="test_vvv2_display/res_snp_summary.tsv" ftype="tabular"/>
            <!-- intermediate output files -->
            <output name="snp_loc" file="test_vvv2_display/res_snp.txt" ftype="txt"/>
	          <output name="json_annot" file="test_vvv2_display/res_vadr.json" ftype="json"/>
            <output name="bed_annot" file="test_vvv2_display/res_vadr.4vardict.bed" ftype="bed"/>
            <output name="correct_vcf" file="test_vvv2_display/res_correct.vcf" ftype="vcf"/>
            <output name="contig_limits" file="test_vvv2_display/contig_limits.txt" ftype="txt"/>
            <output name="contig_names" file="test_vvv2_display/contig_names.txt" ftype="txt"/>
            <output name="cov_depth_corr" file="test_vvv2_display/res_covdepth_corrected.txt" ftype="txt"/>
	          <!-- end intermediate output files -->	    
            <output name="snp_vcf_summary_f" file="test_vvv2_display/res_snp_summary.vcf" ftype="vcf"/>
      </test>
    </tests>
    <help><![CDATA[
[vvv2_display.py]
Aim: Display of SNP proportions, annotations, for an assembly
in:
- vardict variant calling output
- vadr assembly annotations
out:
- png file (image of SNP proportion alongside the assembly with CDS positions)
- txt file with variant calling summary, location in CDS and surround DNA sequence.

usage: vvv2_display.py [-h] [-p FILE] [-f FILE] [-s FILE] [-n FILE] [-r FILE]

optional arguments:
  -h, --help            show this help message and exit
  -p FILE, --pass_tbl_f FILE
                        in: tabular file of vadr annotations, with pass status
  -f FILE, --fail_tbl_f FILE
                        in: tabular file of vadr annotations, with fail status
  -s FILE, --seq_stat_f FILE
                        in: seq stat file of vadr annotator
  -n FILE, --vcf_f FILE
                        in: vcf variant file provided by vardict
  -r FILE, --png_var_f FILE
                        out: png file with variant proportions and annotations
  -w VAR_SIGNIFICANT_THRESHOLD, --var_significant_threshold VAR_SIGNIFICANT_THRESHOLD
                        (percentage var_significant_threshold) Define minimal
                        proportion of a variant to be kept in significant
                        results     
  -y, --covdepth_linear_scale
                        [Optional] to display covepth ordinates in linear
                        scale (default log10 scale)
  -o FILE, --cov_depth_f FILE
                        [optional] in: text file of coverage depths (given by samtools depth)
  -e FILE, --cov_depth_corr_f FILE
                        [optional] out: text file of coverage depths with
                        cumulated position in case of several contigs, for
                        display (tmp file, for galaxy compatibility)
  -t FILE, --snp_loc_f FILE
                        [optional] out: variant description for relevant
                        positions, tsv file (if not provided, file name
                        deduced from png name)
  -u FILE, --snp_loc_summary_f FILE
                        [optional] out: variant description for relevant
                        positions, tsv file (if not provided, file name
                        deduced from png name)
  -x FILE, --snp_vcf_summary_f FILE
                        [optional] out: variant description for relevant
                        positions, vcf file (for SnpEff of NextClade
                        downstream analyses)
    ]]></help>
    <citations>
        <citation type="bibtex">
@misc{githubvvv2_display,
  author = {Touzain, Fabrice},
  year = {2022},
  title = {vvv2_display},
  publisher = {GitHub},
  journal = {GitHub repository},
  url = {https://github.com/ANSES-Ploufragan/vvv2_display},
	}</citation>
      <citation type="doi">10.3390/v17101385</citation>
      <citation type="doi">10.1016/j.virusres.2020.198201</citation>
    </citations>
</tool>

My github repo of the program behind the wrapper is at github project ANSES-Ploufragan/vvv2_display (with associated conda/mamba package) (sorry, the formular does not accept I post a link).

I also created 4 Galaxy workflows with all steps needed before (alignment, vadr annotation, vardict-java variant calling, coverage depth computation) running only on our local instance for various sequencing technologies.
I do not know how to request installation elsewhere.

Many many thanks again, great job really.
Fabrice

Hi @FTouzain Glad you have this working!

For publishing your work out to the wider Galaxy communities, I can help!

Workflowshttps://iwc.galaxyproject.org/ see the Contributing link near the top

Tools → Main ToolShed repository https://toolshed.g2.bx.psu.edu/ with standards maintained by the IUC tools-iuc/README.md at main · galaxyproject/tools-iuc · GitHub and support provided by Planemo Publishing to the Tool Shed — Planemo 0.75.38.dev0 documentation.

These can become stable artifacts you and others can reference in publications! End users will be searching and importing the workflow resources and administrators can install the tools to support them. You can also considering creating :graduation_cap: GTN Training materials to teach everyone how to use your methods. :rocket: