NCBI SRA Fastq (convert SRA files from GEO into fastq files)

ysrbrs · June 9, 2021, 5:02pm

Hello,

I want to convert SRA files from GEO into fastq files in order to map them on the genome. I have uploaded SRA files directly on Galaxy but it does not seem to be the right thing to do.

Now, I’m going through the instructions on the support page of Galaxy: NCBI SRA Fastq

It asks to manipulate the file before importing to Galaxy but I cannot see mentioned tools on the NCBI SRA Run Selector page (highlighted above). Is there a more detailed explanation or tutorial to follow because the link I shared seems to be the thing I need but it is not very clear to me how to proceed.

Thank you.

David · June 9, 2021, 6:18pm

@ysrbrs,

First you isolate the IDs:

Organizing metadata

The “RunInfo Table” provides the experimental condition and replicate structure of all of the samples. Prior to importing the data, we need to parse this file into individual files that contain the sample IDs of the replicates in each condition. This can be achieved by using a combination of the ‘group’, ‘compare two datasets’, ‘filter’, and ‘cut’ tools to end up with single column lists of sample IDs (SRRxxxxx) corresponding to each condition.

Then you import IDs with Download and Extract Reads in FASTA/Q; which corresponds to NCBI SRA Tools (fastq-dump):

What it does?

This tool extracts data (in fastq format) from the Short Read Archive (SRA) at the National Center for Biotechnology Information (NCBI). It is based on the fastq-dump utility of the SRA Toolkit.

ysrbrs · June 10, 2021, 7:33am

Thank you for your reply!

I have done what’s recommended but now I have 18 individual fastqsanger.gz files but I want to group them by condition, replica. I have 6 different conditions with 3 replicates each.

Any tutorial advice on how to do it? It would be more organized this way before going with trimming and alignment etc. In addition, I have to do the same analysis with a much bigger dataset so I would be happy to learn it.

ysrbrs · June 10, 2021, 7:42am

https://galaxyproject.org/tutorials/collections/

This tutorial has not been helpful because my fastqsanger.gz files aren’t directly on the history pane but inside a folder “Single-end data (fasterq-dump)”. That is why I cannot see check box on upper right as shown on the link.

David · June 10, 2021, 2:57pm

@ysrbrs, you understand that the datasets are organized by collections?
Can you please paste a screenshot so we can see how your history looks like?

gallardoalba · June 14, 2021, 6:08pm

Hi @ysrbrs,
probably in that case, the faster way would be to upload the datasets in groups, according to your replicates/conditions.

Regards

jennaj · June 14, 2021, 6:30pm

Hi @ysrbrs

That FAQ is intended to help resolve issues around format variations – after the data is in Galaxy. The collections GTN tutorial is an overview of the ways to manipulate data in collections.

@David is correct about the output of the collection from this tool, and the advice from @gallardoalba is a good simple way to get your data organized at the start.

Should you want to explore advanced methods to download data in a structured format from the very start, see the GTN tutorials here under the section Uploading Data – the two about the “Rule Based Uploader” are what you will be doing, and the one above it provides context and the one below it demonstrates usage in the context of a practical example. Many other tutorials cover the Upload/Collection functions – use the search at that site to find them. Even it is not exactly your use case, can still be helpful.

ysrbrs · June 22, 2021, 1:57pm

Dear all, @David @gallardoalba

Thank you for your responses. I somehow did not see these messages. Sorry for the late reply.

I figured out that, in Galaxy, it’s easier to merge multiple files into a data collection, than splitting a collection into multiple files.

So, I uploaded multiple Acc_List.txt in groups depending on the condition. Each contained only three (replicates) SRA Run accessions for each sample.

Topic		Replies	Views
fasta to fastq; fastsanger.gz to fastq; SRA to fastq ncbi , sra , fastqsanger , quality-control	3	5707	February 11, 2020
Difficulty in getting SRA into Galalxy usegalaxy.org support ncbi , mapping , fastqsanger , quality-control	6	2010	December 18, 2019
Faster download FASTQ from NCBI SRA_ERROR usegalaxy.org support upload , troubleshooting , transcriptomics	3	35	November 6, 2024
How long does it take to create single-end and pair-end data (fastq-dump) from SRAs? usegalaxy.org support ncbi , get-data	3	1804	September 11, 2019
Transferring fastQ data from ArrayExpress database to Galaxy upload , quality-control	1	368	September 22, 2023

NCBI SRA Fastq (convert SRA files from GEO into fastq files)

Related topics