paired-end RNA seq with biological replicates

s.godbole · February 1, 2022, 10:27am

Dear Galaxy Team,

I would like to perform paired-end RNA seq analysis for 21 samples with all of them having a biological replicate. I would like to know how should I build a dataset list for such a cohort. should one data list have all R1 and the other datalist have all R2 to be able to run commands on all sample at once? And how should I account the biological replicates?

Thank you very much in advance for your help!

Best,
Shweta

gallardoalba · February 1, 2022, 1:58pm

Hi @s.godbole,
could you provide me with some additional information about which type of analysis do you want to perform?

Regards

s.godbole · February 1, 2022, 2:39pm

Hey @gallardoalba ,

Thanks for your reply! So I have paired-end raw rna-seq data for 7 different cell lines each of them having 3 biological replicates and each biological replicate having 2 technical replicates. Out of these 7 cell lines 3 belong to one particular subgroup and other 4 to another. So I would like to perform all steps from QC-trimming-mapping-to differential expression between these two groups. I hope this makes sense.

Since I have so many files I would like to know how should I build my dataset lists to be able to execute the same command on all samples.

thanks once again for your reply and I look forward to your response,

Best,
Shweta

gallardoalba · February 1, 2022, 5:15pm

Hi @s.godbole,
there are different options; since you have technical replicates, I suggest you build a list of dataset pairs for each sample and its technical replicate. Assuming that you are going to follow this pipeline:

Cutadapt → RNASTAR → featureCounts → DESeq2

You can work with that collections until the DESeq2 stage. Then you need to merge the technical replicates together in order to use them in DESeq2, which can be done by using two tools:

Column join on multiple datasets: it allows you to collapse the the technical replicates collections into a single count file with three columns (GeneID, counts replicate 1 and counts replicate 2).
Table Compute computes operations on table data: it will allow you to sum up the count of each replicate.

Regards

s.godbole · February 2, 2022, 8:07am

Dear @gallardoalba ,

Okay… and should each biological replicate be treated as an individual sample? And can these data pairs be further constructed into a data list to run all commands on all the pairs at once?

Thanks once again,

Best,
Shweta

gallardoalba · February 2, 2022, 1:06pm

Hi @s.godbole,
yes, I suggest you treat the biological replicates as individual samples; you could also create a data list that includes all of them, but if you do it, after running featureCounts you will need to extract the datasets from the collection by using Extract dataset and reorganize them a bit in order to merge the technical replicates.

Regards

s.godbole · February 4, 2022, 4:32pm

Dear Cristobal,

Did I understand you correctly when you said I should make data pairs for technical replicates and keep the biological replicates as separate samples?

Since I have paired-end RNA seq data I already have 2 read files R1 and R2 for one sample and its technical replicate which also is paired-end so in all, I have 4 files. Galaxy of course would not build data pairs with 4 files, right?

I was wondering if I misunderstood something?

I can give you an example of the data I have

Sample 1_Biological replicate1: technical replicate1_R1

technical replicate1_R2

technical replicate2_R1

technical replicate2_R2

Sample 1_Biological replicate 2 : (has again 4 files ( 2 technical replicated paired-end))

Sample1_Biological replicate 3 : (has again 4 files ( 2 technical replicated paired-end))

I really look forward to your response and suggestions,

Best,

Shweta

bjoern.gruening · February 5, 2022, 9:08pm

@s.godbole I highly recommend studying our training material for transcriptomics.

You will find many different cases with replicates and background information

s.godbole · February 7, 2022, 7:04am

Dear @bjoern.gruening,

Many thanks! I will surely look into this

Thank you very much

Best,
Shweta

s.godbole · February 7, 2022, 3:13pm

hey @gallardoalba @bjoern.gruening ,

I looked through all your tutorials and now I am a bit more confused actually. I was wondering if it would be better to build data pairs for R1 and R2 of my first technical replicate and similarly for 2nd technical replicate and then build a collection of them and merge the counts I get later
OR
I build a collection of R1 of both my technical replicates and another collection R2 of both technical replicates and then generate the count file after processing them this way?

Is there a right or wrong way of doing this?

Best,
Shweta

Topic		Replies	Views
RNA-Seq Differential Expression Analysis usegalaxy.org support gtn-tutorial , transcriptomics , htseq-count , edger	1	648	February 4, 2021
Paired-End RNA Seq Trimming Workflow usegalaxy.eu support workflow , transcriptomics , rna_star	0	421	October 20, 2022
Deseq2 Biological v. Sequencing replicates	0	331	March 4, 2021
Comparison of RNA-seq data with a published paper. transcriptomics , rna_star	1	639	September 30, 2022
RNA-seq reads to counts with pair-end data usegalaxy.eu support workflow	1	308	February 28, 2023

paired-end RNA seq with biological replicates

Related topics