FASTQ Groomer, fastq.qz and fastqsanger

Tahleel · October 22, 2024, 5:38pm

As the quality scores of the FASTQ files (.fastqsanger.gz) generated by the Illumina 1.9 pipeline scaled to Sanger Phred+33 , I do not need to apply FASTQ Groomer.

Do I still need to manually assign the format/datatype of the files to fastqsanger before continuing with the data analysis in Galaxy? I tried to do this, but the process failed after running the FastQC tool.
So, how to ensure that Galaxy recognizes the correct datatype for my files if I dont change there format to fastqsanger ?

igor · October 22, 2024, 11:39pm

Hi @Tahleel,
Galaxy automatically recognizes and assigns a proper datatype during upload of files with reads. It works well for short reads, but not for some long reads, for example, when wide range of Phred scores is used. You can upload reads and check the assigned datatype. If a dataset got “fastq.gz” datatype change it to “fastqsanger.gz” via Edit Attributes (pencil icon) > Datatypes tub > in Assign Datatype section select “fastqsanger.gz” from the pull-down menu. Alternatively, you can specify “fastqsanger.gz” during upload, if you have GZipped files.

I do not recommend FASTQ Groomer in this situation. It creates a copy of file, while all you need is a correct metadata (datatype). Plus, the Groomer is super slow with default settings. You can speed it up by disabling summary. FASTQ Groomer is needed for reads with old illumina encoding.

Kind regards,
Igor

Tahleel · October 23, 2024, 2:39am

Hi, @igor
Thank you for your reply. The files I have are fastq.gz, but Galaxy automatically recognizes them as fastqsanger.gz. Should I change the format to fastqsanger via the Edit Attributes (pencil icon), or can I simply continue with my analysis without making any changes?

igor · October 23, 2024, 6:49am

Hi @Tahleel,

No, keep the datatype assigned by Galaxy.

Your reads are in FASTQ format compressed with GZip.

Galaxy assigns metadata called datatype to files, something like a label in a shop. Tools in Galaxy handle data according to datatypes. For example, the same text file can have “tabular” or “txt” or some other datatypes. With tabular datatype it will be treated as columns (think Excel), with txt datatype it will be treated as made of lines.

If you assign fastqsanger datatype to GZipped FASTQ files, tools will expect plain text FASTQ data, and jobs will fail. GZipped files should have fastqsanger.gz (or fastq.gz). The opposite is also true. For example, if you assign “fastasnager.gz” datatype to plain text FASTQ file in Galaxy, it will not make it compressed. The tools will expect GZipped data, and most likely will fail.

Datatypes (labels) in Galaxy should match data format.

I hope I have not confused you.

Kind regards,
Igor

Tahleel · October 23, 2024, 7:45am

It is very clear. Thank you very much for your help, @igor !

Topic		Replies	Views
How to convert fastq.gz to fastqsanger.gz usegalaxy.org support upload , fastqsanger , epigenetics , quality-control	4	1109	August 24, 2023
fastq vs fastqsanger fastqsanger , datatype , quality-control	3	3009	May 10, 2022
fastqsanger.gz file not recognized in usegalaxy.org usegalaxy.org support ncbi , mapping , fastqsanger , quality-control	11	3195	February 25, 2021
From fastq.gz to fastq usegalaxy.org support fastqsanger , datatype	1	450	January 22, 2024
Convert from BAM to fastq usegalaxy.org support fastqsanger , quality-control	18	2990	May 29, 2019

FASTQ Groomer, fastq.qz and fastqsanger

Related topics