FastQC fails in MRSA Genome Assembly tutorial

Hi,

I am trying to get myself acquainted with Galaxy working through tutorials, but keep running into problems.
For example, I am trying the " Genome Assembly of a bacterial genome (MRSA) sequenced using Illumina MiSeq Data" tutorial (Redirecting…). After loading the data and renaming them, the FastQC step fails on one of the two datasets (DRR187559_2):
" An error occurred with this dataset:
format txt database ?
application/bz2 Failed to process file".
I did not find the tool " FastQC ( Galaxy version 0.74+galaxy0)" as shown in the tutorial. I used the closest one I found: “FastQC Read Quality reports (Galaxy Version 0.74+galaxy0)”
Is that the problem? What should I have done differently?

I am on usegalaxy.eu and the history is at
Galaxy

Thanks!

Welcome, @M_Plank

Thanks for sharing the history and explaining what is going on. The tool is fine, it seems the problem was introduced during Upload.

The tool is reporting that the input fastq data is truncated. You probably need to upload it again (since it is from a tutorial, and should be complete). Later on, when using your own data, you might also need to check it from the source to find out where the problem was introduced.

Ran out of data in the middle of a fastq entry. Your file is probably truncated

Those types of messages might be in the job logs (click into the job details using the “i” icon), and sometimes directly on the expanded dataset in the history view. It depends on the tool but reviewing inputs, parameters, and logs are always the best places to start troubleshooting. Check to see if you can find it for this job – your example has the same message shown in a few places :slight_smile:

More navigation help → FAQ: Troubleshooting errors

General Upload help → Getting Data into Galaxy

Hi jennaj,

thank you!
It`s helpful to see these error messages and understand that the problem occurred during file upload.
I loaded the file directly from NCBI SRA this time and FastQC worked. Then I tried again with the one from the tutorial link (https://zenodo.org/record/4534098/files/DRR187559_2.fastqsanger.bz2) and it failed again. The file sizes are different betwen the two ways of upload. Is it possible that the file on zenodo.org is already corrupted?

Thank you.

1 Like

Hi @M_Plank

I’m checking the tutorial data right now. More feedback today.

Hi @M_Plank

The fastq file seems Ok so I suspect the problem is with the compression on that second file. We’ll figure out exactly what is going on and fix it in the tutorials/Zenodo.

Meanwhile, try this:

  1. Load up the two files for the tutorial into a new history. You can delete the history you were working before if you need to recover space, or just don’t want to get mixed up.

  2. Click on the pencil icon for each of those two files, one at a time, and convert to the uncompressed format → fastqsanger

  3. Run through the tutorial, or use the tutorial’s workflow, with the uncompressed versions of the data instead of the bz2 compressed files.

This worked for me. The test/example history is here: https://usegalaxy.org/u/jen-galaxyproject/h/genome-assembly-of-mrsa-using-illumina-miseq-data

Screenshots for what to select on the pencil icon → Edit Attributes → Datatypes tab. For this use case, uncompress after the data is loaded into Galaxy. Do the same action on both files. Later on, you can learn how to put files into a collection folder and adjust the datatype the same way, but as a batch action.

Step 1 – Click on the pencil icon

uncompress-1

Step 2 – Click into the Datatypes tab of the Edit Attributes forms in the center panel, and review the convert choices

Step 3 – Choose fastqsanger as the new datatype, then submit the job by clicking on Create Dataset

Two new files will be added to the history. Let those process. You can rename the datasets if you want to, or just use them as they are.

Thanks for following up and let us know if you have more problems!

Ticket → Odd compression on the R2.bz2 file for mrsa-illumina tutorial · Issue #4730 · galaxyproject/training-material · GitHub

Thank you. This worked for me as well.

1 Like

Thanks to the ticket filed by @jennaj we have created a new record which will work correctly going forward. Thanks for filing this issue @M_Plank, I had not had enough reason to track down the why before and it was an interesting result.

1 Like