FastQC Troubleshooting

Sonenshine · August 19, 2024, 1:11pm

Hi Jennifer:

I am having trouble with fastqc not working on virtually any uploads I put into Galaxy. I uploaded diverse bacterial genomes, made sure that the attribute (typically fastq or fastq.gz) was assigned, and proceeded to fastqc. Invariably, it crashed. I tried text files, smaller files (1 megabyte), etc. Any suggestions about why this is happening?

Thanks

Daniel

Sonenshine · August 23, 2024, 1:10pm

Hi Jennifer:

I am having trouble with fastqc not working on virtually any uploads I put into Galaxy. I uploaded diverse bacterial genomes, made sure that the attribute (typically fastq or fastq.gz) was assigned, and proceeded to fastqc. Invariably, it crashed. I tried text files, smaller files (1 megabyte), etc. Any suggestions about why this is happening?

Thanks

Daniel

jennaj · August 23, 2024, 11:43pm

Hi @Sonenshine

It looks like other people had trouble helping with your new question. Would you please provide a bit more context? Which server are you working at (URL)? Where were the reads sourced and can you share a small example? What do the job logs report as messages? Are you using the most current version of that tool at that server (check with the Options menu)?

Sharing your history is one way to do all of this, and allows the most people to provide feedback. See the banner at the forum for all options for sharing your work (screenshots of those same data points, or copy/paste text, etc).

Thanks!

Sonenshine · August 24, 2024, 6:23pm

Hi Jennifer:

Thanks for taking the time to investigate why Galaxy is not processing the bacteria genomes I uploaded.

So, I selected the tutorial “reads to counts” from the transcriptome database in the Galaxy Training Network United States. That is the URL.

First, I loaded the abbreviated version of the bacterial genome, Rickettsia bellii, only 1 megabyte. After it loaded, I changed the datatype to fastq

Next, I tried to run the tool, fastqc, since there were no purity scores for each gene sequence. See screen shot # 50 attached. Fastqc failed!!

Next, I tried it again but changed the datatype to “fastqsanger”, because that worked when I ran the procedure for a different bacteria, Rickettsia conorii. That also failed (screen shot 52).

Next, concerned that the system may have preferred the original download, I uploaded the full genome for Rickettsia rickettsii, now with a the datatype fastq.gz, and repeated running fastqc, and that failed (screen shot 53).

Perhaps this tutorial is not optimized for prokaryotes and I need to go a different set of tutorials? If that is true, can you advise where to find it?

Thanks for your help.

Daniel

Dr. Daniel E. Sonenshine

Professor (Emeritus)

Department of Biological Sciences

Old Dominion University

Norfolk, Virginina 23529

Tel. (757) 404-4331

Fax (301) 897-0935

e-mail: dsonensh@odu.edu

(attachments)

jennaj · August 26, 2024, 6:25pm

Hi @Sonenshine Thanks for explaining, and great screenshots will all the details, very helpful!

Loading read data should always use default settings. If you don’t get the expected datatype, then this nearly always indicates some content problem to address.

The technical issue is how the + quality score lines are annotated: they are in a legacy Illumina format, and the quality scores themselves may have a legacy Illumina scaling. This has major scientific implications as well. Why? The wrong scaling (from what tools are expecting to process) will throw off all other statistical calculations.

All tools but a few tools will expect Sanger Phred +33, designated as fastqsanger and fastqsanger.gz in Galaxy.

You can load up other formats, and make adjustments (tools will be used to standardize the quality score + annotation line, then re-scale the quality scores themselves). I wrote up some FAQs back when this was a more common necessity, and those should all still work if you want to try. Warning: a bit complicated!

Let the Upload tool detect the format
Adjust the + lines first (this should repair the FastQC issue)
Then “groom” if needed
See here for all → Galaxy FAQs.
And, there was another recent discussion about this if you want more details about the “why” and exact steps. → Faster Download and Extract Reads in FASTQ and ENA reads are slightly different - #2 by jennaj

Or, you can get the SRR reads from NCBI already in fastqsanger format. These will already have the + annotation line standardized, and data points rescaled (if needed based on the original sequencing protocol), directly from the archives, with either of these two tools:

Into collection folders with → Faster Download and Extract Reads in FASTQ format from NCBI SRA
Into individual datasets with → Download and Extract Reads in FASTQ format from NCBI SRA

Please give those a review, thanks!

Topic		Replies	Views
FastQC fails in MRSA Genome Assembly tutorial usegalaxy.eu support gtn-tutorial , assembly	6	197	February 20, 2024
FastQC doesn't work usegalaxy.org support quality-control	15	8758	July 16, 2020
FASTQ Groomer, fastq.qz and fastqsanger upload , fastqsanger	4	80	October 23, 2024
fastq.gz.fastsanger.gz to fastq.gz in Galaxy and FastQC usegalaxy.org support troubleshooting	1	339	January 6, 2024
Problem with uploading file usegalaxy.org support upload , troubleshooting , datatype	1	732	February 3, 2020

FastQC Troubleshooting

Related topics