Strand bias and placement bias in variant calls

I am working to identify variants in WGS data from an intron rich, haploid eukaryotic GC rich algae, for which human or bacteria derived pipeline parameters may not be appropriate. To maximise robustness of variant calls, I have been trying to develop a benchmark dataset of accurate variant calls from my population of UV mutants, initially processed with Freebayes, to validate and optimise my variant call pipeline.
I see that in your tutorials, you recommend Info filters for strand bias and placement bias for distinguishing correctly mapped variant reads - in particular using SAP (Strand balance probability for the alternate allele) and EPP (End Placement Probability) which are encoded as Phred-scaled estimates of the probability of deviation from the expected ratio of 0.5, with a suggested cutoff of >20. However, my control, R16, has a well supported and sequenced mutation which is eliminated by these filters, which had scores of 3.3935 for both measures.
Can you help me understand why this might be the case? I have copied the VCF entry below and provided a link to the relevant history which includes many trial analyses. Thanks for your advice

QUAL: 2587.33 . INFO: AB=0;ABP=0;AC=1;AF=1;AN=1;AO=51;CIGAR=1M1D3M;DP=51;DPB=40.8;DPRA=0;EPP=3.3935;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;NUMALT=1;ODDS=595.754;PAIRED=1;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=3052;QR=0;RO=0;RPL=25;RPP=3.05288;RPPR=0;RPR=26;RUN=1;SAF=24;SAP=3.3935;SAR=27;SRF=0;SRP=0;SRR=0;TYPE=del;technology.ILLUMINA=1 GT:DP:AD:RO:QR:AO:QA:GL 1:51:0,51:0:0:51:3052:-261.674,0

Hi @cgreig

Glad to see you are making progress! To reach many more people who are working with Freebayes and different organisms, I would suggest posting this question to a scientific forum where those people are spending time. This forum is mostly for usage questions, and while some scientists are here, most are at other places. Biostars.org is one site and this is an example question similar to yours. You can share the data in the Galaxy history, just be really clear that this is all “working” and you are interested in analysis ideas/feedback or possibly paper suggestions.

In general: the parameters in the tutorials are for that specific set of reference data in the example. And while this can be starting place, there are often considerations for speed and ease of use that make some of those choices different than what might be chosen for full scale analysis for discovery purposes, even if the species were the same!

Hope this helps!

Sorry you are unable to help. The question is about how the parameters can / should be used so might be of interest to this forum. Is there anyone e.g. someone who wrote the tutorial who might be able to enlighten me?