How to combine 2 VCF files in a Workflow...

Silly, simple question, but I can’t find a good solution.

I have 2 VCF files being produced in a workflow, and I’d like to combine these files as a single VCF. The bcf_concat tool will do it, but how do I feed both files to the tool in a workflow?

Cheers,

Mat

1 Like

For a tool with a multiple data input, you should be able to drag multiple datasets onto the same input circle in the workflow editor. For example:

Apologies, I should have been more specific. The inputs start out as a paired collection, the aligner combines into a single bam, and then I would like to run 2 variant callers and combine the output vcf
Something like this…

This doesn’t seem to work on a collection. What am I doing wrong :slight_smile: ?
Cheers,
Mat

Looks like you don’t have two VCF files but two collections of VCF files (ribbons as inputs). Do you want to merge the collections element wise, creating a new collection of combined VCFs?

1 Like

yep :slight_smile:

1 Like

Ideally you could do with collections what @jxtx showed for regular datasets (two or more datasets going into an input that accepts multiple datasets). It does become ambiguous if you try to do this with collections (should the input collections be fully consumed and you get a single output dataset, or treated element-wise and you get a single output collection back). If element-wise, should we iterate over the top level collection or the lowest level collection ?)

There is one trick that should work now. You can use the Zip Collection tool, which will create a list of pairs from your two VCF collections. You can then use the Apply Rule to Collection tool where you can take this list of pairs collection as input, click edit in the tool form, and then click Rules -> Add / Modify Column Definitions. Click on Add Definition -> List Identifier and select column A, then click on Assign another column and select column B. Click on Apply, Save and run the tool. You’ll get a list of lists, where the inner list contains 2 elements, one for each variant caller. You can use this list in bcftools concat to get a single collection of combined VCFs.

Unfortunately the user interface in the workflow editor doesn’t let you do this at the moment, but you can extract this operation from the history. I’ve done this here: https://usegalaxy.org/u/marius/w/workflow-constructed-from-history-test-1.
Hope that helps!

2 Likes

I actually do this a bit differently than what @mvdbeek suggested.
Use the Merge Collections tool to merge all of your collections into one big one, but set the
‘How should conflicts (or potential conflicts) be handled?’ advanced option to ‘Append suffix to every element identifer’ and the default suffix ‘_#’. Then send that into the Apply Rules tool with the following rule:

{
    "rules": [
        {
            "type": "add_column_metadata",
            "value": "identifier0"
        },
        {
            "type": "add_column_regex",
            "target_column": 0,
            "expression": "\\d+$"
        }
    ],
    "mapping": [
        {
            "type": "list_identifiers",
            "columns": [
                1,
                0
            ],
            "editing": false
        }
    ]
}

The benefit of this method is it works with any amount of input collections. You can then use the Relabel Collection tool to change the identifiers from numbers if need be.

Thanks Nolan, Marius, and James,
I’m going to work on those multiple suggestions. I appreciate your time, you saved me many hours. We have finally got round to attempting to move an old hard-coded ngs-germline pipeline over to our galaxy instance.
Thanks again,
Mat

Hello

I am trying to combine the different variants but its showing that it requires version 1.4 please suggest how to do that. I am stuck

Hello,
I tested the bcftools concat tool suggested by James on usegalaxy.org with two VCF input files, with having set the desired output to uncompressed VCF, and it worked as expected.
Please provide more information on where/how you’re getting the error so we can further assist you.

Actually I have 10 mutants and I got single variant files for each mutant, because I have followed simple backcrossing so I want to add these variant files from each mutant and then subtract them to get the linkage and gene list. Earlier I used the "combine variants " and they worked perfectly but now this showing error and asked for version 1.4 and the one earlier I used was 0.0.4. If I use the one you mentioned bcftools concat what are the options I have to add and its giving me the vcf file. Please suggest

This is the error I am getting