I am trying load some data into galaxy from NCBI however, when I run the SRR numbers most of them only come in as 1 file for single end read but not for the paired end. My data is paired. Here are some examples of how my data is unpacked
This accession has two technical reads and one biological read per spot.
The NCBI tools are sorting the read data in different ways.
Faster Download and Extract Reads in FASTQ is only extracting the single biological read with the default settings. That single biological read is sorted into a single-end collection.
Download and Extract Reads in FASTA/Q is extracting all three reads into one result dataset. It sounds like this is what you want from your examples.
Alternative is to capture the URLs from the data source and paste those into the Upload tool: Run Browser : Browse : Sequence Read Archive : NCBI/NLM/NIH. This would be the same results as the Download and Extract Reads in FASTA/Q tool.
Once you have the reads uploaded, sort them directly and optionally create collections. Examples of how to sort read by identifiers are in this FAQ. Use the regular expression matching options, not the “de-interlacer” tool.
Thank you for the reply.
I am totally new to this bioinformatic and galaxy. When you say two technical read do you mean paired end read? From your answer I understand that the dataset is paired and should therefore be downloaded as FASTA/Q instead of FASTQ if I want to do some downstream analysis?
However, I got total FASTQdump for one data with the FASTQ download
Picture attached(SRR8368415) thats why I got very confused by the other data.