Fix a workflow for Variant calling from WGS data upon mutagenesis-screen

Hi All,

I attached here the workflow that for a while we used to analyze WGS data and process fastq files to arrive to variant calling and annotation. This workflow does not work anymore, and it seems it depends on unified genotyper being deprecated.

Compared to the workflow represented here I already substituted "BWA for illumina " with "BWA (<100bp). Since the output of the new BWA tool is already BAM i also removed the SAM filter and SAM to BAM conversion. I added instead “Mark Duplicates” and left in place “add or replace groups”.

After this point i tried uncountable combination of variant calling and annotation tools with very poor results. The final tabular files i managed to obtained contained either few hundreds (too little) variants or 10 or > thousands (too many), but in no case they would include the background mutation typical of the parental strain that i mutagenized. I can see the mutation when looking at the alligned sequence in the UCSD browser. So i deduced that there was something wrong in the tool i used or the parameter i imposed.

In light of these considerations, I’d like to ask the following:

  1. the step i took so far to modify the old workflow are correct? is there something i should add or remove?
  2. Which variant calling tool is the most suitable to detect germline mutations ( i work with C.elegans) and which follow up annotation and filtering tools should i use?
    From what i read the best new alternative to Unified genotyper is Haplotype Caller but it is not available among the Galaxy tools.

Any advice or input will be very welcome! Thanks a lot!

Hi there and welcome,

yes, the old Cloudmap-like workflows for variant mapping and identification are considered deprecated since a long time. Partly because they are dependent on the old, deprecated GATK tools, but also because they have been superseded by the MiModD tool suite as a more flexible and sensitive alternative.

Starting from your existing workflow, you can keep the Trim Galore and BWA Mem steps, then switch over to the MiModD Variant Calling and MiModD Extract Variant Sites tools. That second tool lets you merge your mutated strain variants with known Hawaiian strain SNPs (if you’re doing Hawaiian strain based mapping). After that you can use MiModD VCF Filter to narrow down the list of candidate variants, annotate your variants with SnpEff eff, perform linkage analyses with MiModD NacreousMap and generate reports of likely phenotype-causing variants with MiModD Report Variants.

The MiModD suite comes with comprehensive documentation, which you can find at https://mimodd.readthedocs.io. Here are two useful additional links to get you started:

The first one is taking you to a Galaxy user training showing the MiModD tools in action on now classic Arabidopsis mapping-by-sequencing data. You should be able to transfer the concepts to C. elegans easily.
The second link has important extra information for using MiModD together with standard mappers (like bowtie2 and bwa-mem) and SnpEff on public Galaxy servers like usegalaxy.org. Please read those instructions carefully.

Good luck with your analyses,
Wolfgang

Thank you so much!!! I am going to go ahead and try it out following all the instructions!

:+1: :+1: