Hi, I was wondering if the percentage in the overrepresented sequences section of the FastQC output is really a percentage, as in 5% would show as a 5? Or would 5% show as 0.05?
Thanks you!
Hi, I was wondering if the percentage in the overrepresented sequences section of the FastQC output is really a percentage, as in 5% would show as a 5? Or would 5% show as 0.05?
Thanks you!
Hi @gperron
These are true percentages with as many significant digits as there is space to write them out.
Your example
Real data, graph view
Same data, raw view
>>Overrepresented sequences fail
#Sequence Count Percentage Possible Source
CGGTGCTCGACCCCTCCGACCCCCGCCGGCCGCTTCGAGCCTGAGCCCTT 76412 1.5504756818485261 No Hit
The usage/display in Galaxy is the same as the original tool, so the documentation linked from the bottom of the tool form can be a really useful reference wherever you are running it. There is per-module help plus example reports of “good” and “bad” data and other common use cases.
Babraham Bioinformatics - FastQC A Quality Control tool for High Throughput Sequence Data → Index of /projects/fastqc/Help → Overrepresented Sequences
This module lists all of the sequence which make up more than 0.1% of the total. To conserve memory only sequences which appear in the first 100,000 sequences are tracked to the end of the file. It is therefore possible that a sequence which is overrepresented but doesn't appear at the start of the file for some reason could be missed by this module.
Thank you Jennifer!
Best,
Gabrielle