How to filter rare variants (10%) out

Hi, I have huge file 107 GB, IT IS POOLED DATA CONSISTING OF 50 HOLE GENOMES
I need to filter the rare variants ( 10%) of this 107 GB .bam file .
What tool do I use for it please?
Thank you in advance .

Lidia

1 Like

Can’t give a full solution now but you could start by looking at freebayes

1 Like

If you are starting from a BAM file, you need to both call and filter variants to get to all sites that are more than 10% variable.

Since you know the number of genomes you can use --pooled-discrete with a --ploidy of 100. Or you can use ``–pooled-continuous`.

Then you can filter using VCFFilter, the AF info field in the VCF contains frequency information.

1 Like

Hi, thank you very much. This is my very first work with Galaxy analysis and I have to support Phd student , may I ask you if there is tutorial that I can follow to call and filter variants please .

It will need to be adjusted because you are using pooled samples, but this is probably a good start: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/dip/tutorial.html

The non-diploid tutorial uses prokaryotic examples, but describes the pooled options: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/non-dip/tutorial.html

Other variant analysis tutorials are here: https://training.galaxyproject.org/training-material/topics/variant-analysis/