SRR number not coming in as one single fastq dump file but single end file

Dear Galaxy helper,

I am trying load some data into galaxy from NCBI however, when I run the SRR numbers most of them only come in as 1 file for single end read but not for the paired end. My data is paired. Here are some examples of how my data is unpacked


when I go into the single end data it looks like this
Screenshot 2022-01-28 at 18.57.53

I am new to galaxy but I assume my FASTQ file should come in like this

Only one of them shows it like this.

Thank you in advance

1 Like

Hi @HG123

This accession has two technical reads and one biological read per spot.

  1. The NCBI tools are sorting the read data in different ways.
  • Faster Download and Extract Reads in FASTQ is only extracting the single biological read with the default settings. That single biological read is sorted into a single-end collection.

  • Download and Extract Reads in FASTA/Q is extracting all three reads into one result dataset. It sounds like this is what you want from your examples.

  1. Alternative is to capture the URLs from the data source and paste those into the Upload tool: Run Browser : Browse : Sequence Read Archive : NCBI/NLM/NIH. This would be the same results as the Download and Extract Reads in FASTA/Q tool.

Once you have the reads uploaded, sort them directly and optionally create collections. Examples of how to sort read by identifiers are in this FAQ. Use the regular expression matching options, not the “de-interlacer” tool.

Related Q&A: Failing to load single-cell raw fastq files into Galaxy

1 Like

Hi Jenna,

Thank you for the reply.
I am totally new to this bioinformatic and galaxy. When you say two technical read do you mean paired end read? From your answer I understand that the dataset is paired and should therefore be downloaded as FASTA/Q instead of FASTQ if I want to do some downstream analysis?

However, I got total FASTQdump for one data with the FASTQ download
Picture attached(SRR8368415) thats why I got very confused by the other data.

Bear with me for asking simple questions

Hi @HG123

Data source at NCBI: Run Browser : Browse : Sequence Read Archive : NCBI/NLM/NIH

  • The sequencing protocol is described in the Metadata tab (link outs there provide more details)
  • The read types are broken down in the Reads tab

Galaxy Training Network (GTN) tutorials: https://training.galaxyproject.org/

  • Tutorials can be searched with keywords or browsed in topic categories.
  • Example single-cell
  • Example 10X

There is a training event in March that you might want to consider joining. Single-cell analysis will be covered.

1 Like