FastQ ASCII table for raw sequences' QC

Hello everybody,

I’ve been reading many tutorials and educational materials about basic .fastq files quality control procedures but I’m confused because it doesn’t seem to apply to my experimental data. I was able to upload some .ab1 files from sanger semi-automated sequences to the Galaxy platform and to convert them into .fastq files using https://www.usegalaxy.org.au

Until now any problems. In fact, that’s where they begin, because after retrieving the results I realized something weird about them because it seems they included symbols from 2 different ASCII tables used for this purpose: at the same time my files contain symbols like the characters “#” and “%”, which belong to the 33-base ASCII table, they also contain uppercase letters of the end of the alphabet (R, S, T, U, V, W, X, Y and Z), which belong to the 64-base ASCII table only.

Please note I’m taking as reference the ASCII quality scores corresponding to the tables available at Quality (Phred) scores .

I’m confused! If .fastq sequence scores go from 1 (or zero) to 40, how can it be that I have a quality score composed of 64 characters?

Any clarification is greatly appreciated.

P.S.: some time after having posted I remembered to have seen somewhere the option to share my Galaxy history with other people, an option I didn’t find anymore after the question was posted.

Hi @lcfiorini
in the Phred+33 format the range of allowed ASCII values extends from 33 (= !, the first non-whitespace printable character of the ASCII table) all the way to 126 (= ~, its last printable character). This means the encodable Phred quality scores range from 0 to 93.
Everything beyond that are just machine and base caller-specific conventions.
Phred+64 is of historical interest only nowadays (Phred+33, also called fastq Sanger format, is the de facto standard). Because it used ASCII value 64 (= @) as its lowest value, but, like Phred+33, has to end at 126 for printability, it can represent only a smaller range of quality scores.
At the time, when Phred+33 and Phred+64 were both in active use, their ranges would normally not overlap much because base calls from sequencers at the time would have relative large error probabilities, but Phred+33 scores not extending (or not too much) into the Phred+64 range was never a format specification (just a convention at best).
Modern sequencers can for sure produce base quality scores of more than 40 and so what you’re seeing is not at all surprising.

The last paragraph of the page you linked to in your question, “Recognizing the format”, tries to explain what I just said, too.