SRR number not coming in as one single fastq dump file but single end file

HG123 · January 28, 2022, 6:59pm

Dear Galaxy helper,

I am trying load some data into galaxy from NCBI however, when I run the SRR numbers most of them only come in as 1 file for single end read but not for the paired end. My data is paired. Here are some examples of how my data is unpacked

when I go into the single end data it looks like this
Screenshot 2022-01-28 at 18.57.53

I am new to galaxy but I assume my FASTQ file should come in like this

Only one of them shows it like this.

Thank you in advance

jennaj · January 28, 2022, 9:05pm

Hi @HG123

This accession has two technical reads and one biological read per spot.

The NCBI tools are sorting the read data in different ways.

Faster Download and Extract Reads in FASTQ is only extracting the single biological read with the default settings. That single biological read is sorted into a single-end collection.
Download and Extract Reads in FASTA/Q is extracting all three reads into one result dataset. It sounds like this is what you want from your examples.

Alternative is to capture the URLs from the data source and paste those into the Upload tool: Run Browser : Browse : Sequence Read Archive : NCBI/NLM/NIH. This would be the same results as the Download and Extract Reads in FASTA/Q tool.

Once you have the reads uploaded, sort them directly and optionally create collections. Examples of how to sort read by identifiers are in this FAQ. Use the regular expression matching options, not the “de-interlacer” tool.

Related Q&A: Failing to load single-cell raw fastq files into Galaxy

HG123 · January 29, 2022, 7:30am

Hi Jenna,

Thank you for the reply.
I am totally new to this bioinformatic and galaxy. When you say two technical read do you mean paired end read? From your answer I understand that the dataset is paired and should therefore be downloaded as FASTA/Q instead of FASTQ if I want to do some downstream analysis?

However, I got total FASTQdump for one data with the FASTQ download
Picture attached(SRR8368415) thats why I got very confused by the other data.

Bear with me for asking simple questions

jennaj · February 3, 2022, 7:23pm

Hi @HG123

Data source at NCBI: Run Browser : Browse : Sequence Read Archive : NCBI/NLM/NIH

The sequencing protocol is described in the Metadata tab (link outs there provide more details)
The read types are broken down in the Reads tab

Galaxy Training Network (GTN) tutorials: https://training.galaxyproject.org/

Tutorials can be searched with keywords or browsed in topic categories.
Example single-cell
Example 10X

There is a training event in March that you might want to consider joining. Single-cell analysis will be covered.

All events: Galaxy Event Horizon - Galaxy Community Hub
From that page find the GTN Smorgasbord2 (Tapas) event: GTN Smörgåsbord 2: 14-18 March | Gallantries

Topic		Replies	Views
Failing to load single-cell raw fastq files into Galaxy usegalaxy.org support upload	1	628	September 20, 2021
Fastqcdump file issue usegalaxy.org support upload , troubleshooting	1	190	March 19, 2024
Download FASTQ reads from SRA usegalaxy.org support sra	0	708	May 27, 2020
Error recognizing FASTQ format for SRR files from NCBI usegalaxy.org support upload , tool-help , fasterq_dump	4	68	September 30, 2024
SRA- GALAXY read input usegalaxy.org support	3	348	May 24, 2022

SRR number not coming in as one single fastq dump file but single end file

Related topics