Separate/split replicates in single SRA file

Medine_K · June 2, 2020, 2:35pm

Hello,

I’m new to galaxy and working on an RNAseq experiment where I have 2 conditions with 3 replicates for each one. The NCBI SRA accession number is SRP154796.

To perform DGE analysis on DESeq2 or edgeR, I need to have a separate count file for each replicate (that is what I understood after reading a few posts here). However, I don’t have access to the original fastq files on NCBI and can only download a single SRA file where all the replicates are merged.

I’ve used a tool on galaxy to convert my SRA file to fastq format but I obtain a single fastq file with interleaved reads. When I download the files from ENA, the reads are separated in forward/reverse but the replicates are still merged :
https://www.ebi.ac.uk/ena/data/view/SRX4415824

I couldn’t find a way to separate the replicates, hence I only have 2 count files for DESeq/edgeR and constantly get an error.

Does anyone know how I can get the original fastq files for all the replicates? Or if the problem comes from the way I’m using DESeq2/edgeR?

Thank you in advance.

jennaj · June 4, 2020, 7:39pm

From a review, it appears that only the biological reads were published or the formatting of the EBI SRA file is problematic. There are no “original submission” fastq data for either paired-end sample. That is usually an indication that the original data was published somewhere else but I didn’t find the data at NCBI’s SRA. Parsing the EBI-sourced SRA with NCBI’s SRA toolkit failed – but you could also explore the data that way (line-command – won’t work in Galaxy due to format issues in the SRA file itself).

BUT – none of that will help with your analysis. Technical replicates are not appropriate for differential expression analysis – they are used to evaluate the quality of different sequencing runs based on the same biological sample. These tools require at least two conditions with at least two biological replicates each for valid expression analysis. Biological replicates are published (or rather, should be published) as distinct runs – and this data appears to only have one paired-end run (one biological sample) per condition.

FAQs related to fastq data are near the top and DE tools are covered in the last one in this Support FAQ group: Galaxy Support - Galaxy Community Hub

Thanks!

Medine_K · June 4, 2020, 9:09pm

Thank you very much for your reply and this detailed explanation!! I was really going crazy because I couldn’t understand. Have a wonderful day and thank you again!

Topic		Replies	Views
NCBI SRA Fastq (convert SRA files from GEO into fastq files) usegalaxy.org support metadata , sra , quality-control	7	1781	June 22, 2021
SRR number not coming in as one single fastq dump file but single end file usegalaxy.org support	3	658	February 3, 2022
Failing to load single-cell raw fastq files into Galaxy usegalaxy.org support upload	1	623	September 20, 2021
Incomplete uploading from ENA's EBI SRA -- Solution: try NCBI SRA instead usegalaxy.eu support upload , ncbi , get-data	2	776	July 2, 2021
SRA- GALAXY read input usegalaxy.org support	3	340	May 24, 2022

Separate/split replicates in single SRA file

Related topics