Error recognizing FASTQ format for SRR files from NCBI

Andrew_Berti · September 30, 2024, 3:35pm

I’m trying to assemble publicly-available whole-genome SRA files for Staphylococcus aureus bacteria. Upon deposition, NCBI interlaces the two submitted “R1” and “R2” FASTQ files into one. When I download the interlaced file from NCBI and upload to Galaxy.org it recognizes the FASTQ files as “txt” instead of something usable like “fastqsanger.gz”. Any idea how I can modify the downloaded interlaced file so that Galaxy can recognize the proper file type? I don’t have a cloud delivery service so I can’t just download the input files…

Interlaced reads from SRR10591328.fastq.gz (and the extracted file) read as txt while original submitted (not interlaced) runs read as the correct fastqsanger.gz file type

jennaj · September 30, 2024, 4:21pm

Hi @Andrew_Berti

We have a guide here that might help. → Getting Data into Galaxy

That guide links to tutorials, and I would suggest these to start with. →

Tutorials for Faster Download and Extract Reads in FASTQ: format from NCBI SRA
The simplest methods are in the first steps of this tutorial. → Hands-on: Unicycler assembly of SARS-CoV-2 genome with preprocessing to remove human genome reads / Unicycler assembly of SARS-CoV-2 genome with preprocessing to remove human genome reads / Assembly

I started a test run in this history using the same tool with your accession to see what happens. This is a shared history, so you can click on that link to see the how it worked. → https://usegalaxy.org/u/jen-galaxyproject/h/test-srr10591305

I see a “list paired collection” result, not interleaved. But you can change how that data is organized: interleaved, separate files, plus various collection folder shapes.

Please give all of this help a review, and if I am misunderstanding, please explain a bit more. Then, if you need to change how your data is organized, and are not sure how to do that in Galaxy, we can help with that too. We’d need to know how the data is organized now, and what tool you plan to use next.

Let’s start there and please let us know if you get this working!

Why change the “shape” of data? Galaxy hosts tools written by many different authors. Those authors had different data expectations! You can quickly change how your data is organized at any time. Extract those steps into a workflow for easy reuse.

Recent Galaxy News Workflows Workflows Workflows! - Galaxy Community Hub

Andrew_Berti · September 30, 2024, 4:57pm

You are so very awesome! I really need to work on my Galaxy bioinformatics tool knowledge!

Andrew_Berti · September 30, 2024, 4:57pm

(running the assembler now…since it is a small genome should have results shortly but…it’s running which is progress!)

jennaj · September 30, 2024, 5:10pm

Great, I’m so so glad that helps!

Topic		Replies	Views
fasta to fastq; fastsanger.gz to fastq; SRA to fastq ncbi , sra , fastqsanger , quality-control	3	5758	February 11, 2020
fastqsanger.gz file not recognized in usegalaxy.org usegalaxy.org support ncbi , mapping , fastqsanger , quality-control	11	3204	February 25, 2021
Paird-end Fastq-dump Manipulation - Fastq De-Interlacer	3	2525	May 7, 2019
How to upload .fastq? usegalaxy.org support fastq-format	3	358	January 11, 2023
Fastqcdump file issue usegalaxy.org support upload , troubleshooting	1	190	March 19, 2024

Error recognizing FASTQ format for SRR files from NCBI

Related topics