Hi, I have huge file 107 GB, IT IS POOLED DATA CONSISTING OF 50 HOLE GENOMES
I need to filter the rare variants ( 10%) of this 107 GB .bam file .
What tool do I use for it please?
Thank you in advance .
Lidia
Hi, I have huge file 107 GB, IT IS POOLED DATA CONSISTING OF 50 HOLE GENOMES
I need to filter the rare variants ( 10%) of this 107 GB .bam file .
What tool do I use for it please?
Thank you in advance .
Lidia
Can’t give a full solution now but you could start by looking at freebayes
If you are starting from a BAM file, you need to both call and filter variants to get to all sites that are more than 10% variable.
Since you know the number of genomes you can use --pooled-discrete
with a --ploidy
of 100. Or you can use ``–pooled-continuous`.
Then you can filter using VCFFilter, the AF
info field in the VCF contains frequency information.
Hi, thank you very much. This is my very first work with Galaxy analysis and I have to support Phd student , may I ask you if there is tutorial that I can follow to call and filter variants please .
It will need to be adjusted because you are using pooled samples, but this is probably a good start: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/dip/tutorial.html
The non-diploid tutorial uses prokaryotic examples, but describes the pooled options: https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/non-dip/tutorial.html
Other variant analysis tutorials are here: https://training.galaxyproject.org/training-material/topics/variant-analysis/