what's the sink at 50 percentile of genebody coverage by RSeQC for WGS results

I used NGS: RSeQC - Gene Body Coverage (BAM) Read coverage over gene body. (Galaxy Version - Input: bam, Reference gene model: bed12.
The input is a BAM results of a WGS 2x75 run on NextSeq, reference was hg38.
Then for lib#261, I got this significant drop at 50% percentile.
Can anyone explain it to me?
Or if there’s better tool for the coverage uniformity check.

1 Like

Thanks for clarifying the usage.

X-axis = Gene Body Percentile (5’ > 3’) – all transcript regions normalized at 100
Y-axis = Coverage – counts of read coverage per transcript regions

This is a graph of the RseQC txt output, an alternative view of the pdf output produced by RseQC (Y-axis % coverage). This type of graph can be generated by MultiQC or other plotting methods.

What this means: The WGS reads are not mapping well to the middle of transcripts. There is 3’/5’ bias with some type of problem at very end of the 3’ UTR (sequencing artifact? library construction bias?).

Some things to check:

  • The sample size is pretty small and might be biased. Try checking a random sample (or all) of properly paired reads with a MapQ of 30.

  • Does the BED12 represent a UCSC “Genes and Gene Predictions” track that is complete (full gene)? RefSeq and Ensembl are good choices. Avoid Genebank’s “All mRNA” and other types of fragmented/high duplication tracks.

  • WGS data needs to be mapped with an unspliced mapping tool. Choices can include BWA/BWA-MEM and Bowtie2. Avoid spliced mapping tools – those are for spliced data (e.g. RNA-seq).

  • You might want to run FastQC on the original fastqsanger datasets to find out about artifact and other sequence problems that may be present.

Hope that helps!

Thanks jennaj for the explanation.
For whole genome sequencing data as is the case here, what are segmented into 100 sections by RSeQC?

1 Like

These are the transcripts defined by the BED12 dataset.

Please see: http://rseqc.sourceforge.net/#genebody-coverage-py

For WGS data, comparing this result to the complete genome coverage (not just transcript regions) can be informative. See the tool group BEDTools. Example tools of interest: Genome Coverage, MakeWindowsBed. Tool manual: https://bedtools.readthedocs.io/en/latest/content/bedtools-suite.html (explains line-command options that are mirrored on the Galaxy tool forms)