FastQC Overrepresented sequences percentage

Hi, I was wondering if the percentage in the overrepresented sequences section of the FastQC output is really a percentage, as in 5% would show as a 5? Or would 5% show as 0.05?

Thanks you!

Hi @gperron

These are true percentages with as many significant digits as there is space to write them out.

Your example

  • Five percent == 5.0
  • Point zero five percent == 0.05

Real data, graph view

  • One point five five percent == “1.5504756818485261”

Same data, raw view

>>Overrepresented sequences	fail
#Sequence	Count	Percentage	Possible Source
CGGTGCTCGACCCCTCCGACCCCCGCCGGCCGCTTCGAGCCTGAGCCCTT	76412	1.5504756818485261	No Hit

The usage/display in Galaxy is the same as the original tool, so the documentation linked from the bottom of the tool form can be a really useful reference wherever you are running it. There is per-module help plus example reports of “good” and “bad” data and other common use cases.

Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence DataIndex of /projects/fastqc/HelpOverrepresented Sequences

This module lists all of the sequence which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 100,000 sequences are tracked to the end of the file. It is therefore possible that a sequence which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module.

Thank you Jennifer!

Best,

Gabrielle

1 Like