Questions regarding VarScan: igv viewer & genotype quality & default p-value

varscan
#1

Dear community,

I have some questions regarding the readouts of VarScan.
I ran VarScan (SNP and Indel) on my BAM pileup, and imported the vcf data to igv.

1st question: I got 2 tracks for each imported vcf on igv. The upper track presents gray bars as detected variants; the lower track (automatically named sample 1) presents blue (or light blue) bars. I do not understand why it is presented in this way, and what the different colors indicate.

2nd question: When I move my cursor to the bars, I can see many information regarding this variant call. What does it mean when my “Genotype quality=0” or “QUAL: -10”? My depth for most of the called variants are above 1000, and the “Minimum base quality at a position to count a read” is set as 30 when running VarScan.

3rd question: When I set-up the parameters before execute VarScan, the default value of “p-value threshold for calling variants” is 0.99. I thought it would be more logical to have it as 0.05, so I guess I do not understand what this parameter is asking for. Can anyone explain it to me?

My sincere appreciation,

Susan

2 Likes

#2

Hi Susan,
your question 1) each variant description in a vcf file consist of two components: general descriptive stats on the variant (like the quality/reliability of the variant call and others) and sample-specific measures (like the most likely genotype of a sample at that variant site). This distinction is less important with your tool because VarScan only analyzes 1 sample at a time (so only produces VCF files with one sample, sample1), but, in general, VCF files can describe the variants found in any number of samples. IGV shows the general variant characteristics on the first track (the grey bars), and the sample-specific information on the following tracks (the blue bars in your single-sample case). The height of the bar indicates coverage, i.e., how many NGS reads provide information about the variant site. For the first track that’s total coverage across all samples.

your question 2) VarScan does not calculate variant call quality and, in your case, also does not calculate genotype quality scores (see answer to question 3 below). You may think of these as measures of the likelihood that a variant allele really exists in any of your samples, and of the likelihood that the called genotype of a given sample is correct, respectively, but since they are missing, IGV just shows its internal number representation for that state.

your question 3) That’s more tricky to answer because the documentation of VarScan is really not very clear on this point, but as far as I understand, the default value of 0.99 serves to turn p-value calculation (which uses Fisher’s exact test and is rather slow) off. The Fisher’s exact test would address the statistical significance of observing e.g. 4 variant-supporting reads out of 20, when the expectation is 0 (wild type genotype of your sample at the site). Only when that p-value is below your threshold, a heterozygous genotype will be called, and the p-value from the test will be turned into a genotype quality score. So, yes, if you want to use these “exact” statistics and want to have GQ values calculated, you should set the p-value parameter to sth like 0.1, 0.5, or the like.

2 Likes