Missing information VCFtoTab-delimited

Hallo galaxy-community,

I am trying to convert my vcf into an tab-delimited format. When using the VCFtoTab-delimited from usegalaxy.com, I saw, that all the variants having annotations to more than one transcript are excluded in this process. So I end up with a tabular file with missing information.

Thanks for your help.

Rose

1 Like

Hi @roselucia

Correct, the tool expects that variants are unique to be valid. Filtering your BAM dataset by mapQ = 20, before calling variants, will help ensure that your SNPs are unique. Try using Filter BAM datasets on a variety of attributes (Galaxy Version 2.4.1) or just search with keywords “bam” and “filter” to see all options.

Link to the underlying VCF tools that explains usage with more details: https://github.com/vcflib/vcflib#executables. The command-line options that are supported for these tools (and most others) are in the Galaxy wrapped version in the help section (scroll down).

If an option is unclear (for any tool), try a toggling/testing options and review related tools until you get the output you want. And review the underlying tool’s documentation (will usually be linked on the tool form).

There are many ways to manipulate both vcf and tabular data in Galaxy.

A VCF with the header removed is a tabular dataset, but data won’t be parsed out the same way as VCFtoTab-delimited. That said, tools under the grouping GENERAL TEXT TOOLS could be used to create your own data parsing choices from vcf-no-header formatting.

If you want to remove the VCF header, use Select with the option to keep only lines that do not match the regular expression: ^#. Change the datatype to be tabular after (pencil icon > Datatypes) so the other text manipulation tools will recognize the dataset as an appropriate input.

Hope that gives you some choices :slight_smile:

Hi - Have another tool choice: BCFTOOLS bcftools query Extracts fields from VCF/BCF file and prints them in user-defined format.

This tool will output the “first” multi-mapping SNP. Might be worth comparing the output versus a mapQ=20 filtered BAM > SNPs called to create VCF > then run through VCFtoTab-delimited.