Strand bias and placement bias in variant calls

cgreig · April 30, 2025, 9:38am

I am working to identify variants in WGS data from an intron rich, haploid eukaryotic GC rich algae, for which human or bacteria derived pipeline parameters may not be appropriate. To maximise robustness of variant calls, I have been trying to develop a benchmark dataset of accurate variant calls from my population of UV mutants, initially processed with Freebayes, to validate and optimise my variant call pipeline.
I see that in your tutorials, you recommend Info filters for strand bias and placement bias for distinguishing correctly mapped variant reads - in particular using SAP (Strand balance probability for the alternate allele) and EPP (End Placement Probability) which are encoded as Phred-scaled estimates of the probability of deviation from the expected ratio of 0.5, with a suggested cutoff of >20. However, my control, R16, has a well supported and sequenced mutation which is eliminated by these filters, which had scores of 3.3935 for both measures.
Can you help me understand why this might be the case? I have copied the VCF entry below and provided a link to the relevant history which includes many trial analyses. Thanks for your advice

QUAL: 2587.33 . INFO: AB=0;ABP=0;AC=1;AF=1;AN=1;AO=51;CIGAR=1M1D3M;DP=51;DPB=40.8;DPRA=0;EPP=3.3935;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=595.754;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=3052;QR=0;RO=0;RPL=25;RPP=3.05288;RPPR=0;RPR=26;RUN=1;SAF=24;SAP=3.3935;SAR=27;SRF=0;SRP=0;SRR=0;TYPE=del;technology.ILLUMINA=1 GT:DP:AD:RO:QR:AO:QA:GL 1:51:0,51:0:0:51:3052:-261.674,0

jennaj · April 30, 2025, 8:12pm

Hi @cgreig

Glad to see you are making progress! To reach many more people who are working with Freebayes and different organisms, I would suggest posting this question to a scientific forum where those people are spending time. This forum is mostly for usage questions, and while some scientists are here, most are at other places. Biostars.org is one site and this is an example question similar to yours. You can share the data in the Galaxy history, just be really clear that this is all “working” and you are interested in analysis ideas/feedback or possibly paper suggestions.

In general: the parameters in the tutorials are for that specific set of reference data in the example. And while this can be starting place, there are often considerations for speed and ease of use that make some of those choices different than what might be chosen for full scale analysis for discovery purposes, even if the species were the same!

Hope this helps!

cgreig · May 1, 2025, 7:19am

Sorry you are unable to help. The question is about how the parameters can / should be used so might be of interest to this forum. Is there anyone e.g. someone who wrote the tutorial who might be able to enlighten me?

Topic		Replies	Views
How to calculate allelic imbalance in usegalaxy freebayes , usegalaxy , variant-analysis	2	482	March 17, 2021
Haplotype analysis usegalaxy.org support igv , variant-analysis	7	43	March 24, 2025
error in freebayes with input parameters usegalaxy.org support variant-analysis	5	520	December 22, 2021
Variant calling with RNA-seq	3	1422	March 17, 2021
Recreating Macrogen WES pipeline in Galaxy usegalaxy.org support	1	492	July 2, 2022

Strand bias and placement bias in variant calls

Related topics