How to filter rare variants (10%) out

Hi, I have huge file 107 GB, IT IS POOLED DATA CONSISTING OF 50 HOLE GENOMES
I need to filter the rare variants ( 10%) of this 107 GB .bam file .
What tool do I use for it please?
Thank you in advance .

Lidia

Can’t give a full solution now but you could start by looking at freebayes

If you are starting from a BAM file, you need to both call and filter variants to get to all sites that are more than 10% variable.

Since you know the number of genomes you can use --pooled-discrete with a --ploidy of 100. Or you can use ``–pooled-continuous`.

Then you can filter using VCFFilter, the AF info field in the VCF contains frequency information.

Hi, thank you very much. This is my very first work with Galaxy analysis and I have to support Phd student , may I ask you if there is tutorial that I can follow to call and filter variants please .

It will need to be adjusted because you are using pooled samples, but this is probably a good start: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/dip/tutorial.html

The non-diploid tutorial uses prokaryotic examples, but describes the pooled options: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/non-dip/tutorial.html

Other variant analysis tutorials are here: https://training.galaxyproject.org/training-material/topics/variant-analysis/