Dear Galaxy Team,
I would like to perform paired-end RNA seq analysis for 21 samples with all of them having a biological replicate. I would like to know how should I build a dataset list for such a cohort. should one data list have all R1 and the other datalist have all R2 to be able to run commands on all sample at once? And how should I account the biological replicates?
Thank you very much in advance for your help!
Best,
Shweta
Hi @s.godbole,
could you provide me with some additional information about which type of analysis do you want to perform?
Regards
Hey @gallardoalba ,
Thanks for your reply! So I have paired-end raw rna-seq data for 7 different cell lines each of them having 3 biological replicates and each biological replicate having 2 technical replicates. Out of these 7 cell lines 3 belong to one particular subgroup and other 4 to another. So I would like to perform all steps from QC-trimming-mapping-to differential expression between these two groups. I hope this makes sense.
Since I have so many files I would like to know how should I build my dataset lists to be able to execute the same command on all samples.
thanks once again for your reply and I look forward to your response,
Best,
Shweta
Hi @s.godbole,
there are different options; since you have technical replicates, I suggest you build a list of dataset pairs for each sample and its technical replicate. Assuming that you are going to follow this pipeline:
Cutadapt → RNASTAR → featureCounts → DESeq2
You can work with that collections until the DESeq2 stage. Then you need to merge the technical replicates together in order to use them in DESeq2, which can be done by using two tools:
Regards
1 Like
Dear @gallardoalba ,
Okay… and should each biological replicate be treated as an individual sample? And can these data pairs be further constructed into a data list to run all commands on all the pairs at once?
Thanks once again,
Best,
Shweta
Hi @s.godbole,
yes, I suggest you treat the biological replicates as individual samples; you could also create a data list that includes all of them, but if you do it, after running featureCounts you will need to extract the datasets from the collection by using Extract dataset and reorganize them a bit in order to merge the technical replicates.
Regards
Dear Cristobal,
Did I understand you correctly when you said I should make data pairs for technical replicates and keep the biological replicates as separate samples?
Since I have paired-end RNA seq data I already have 2 read files R1 and R2 for one sample and its technical replicate which also is paired-end so in all, I have 4 files. Galaxy of course would not build data pairs with 4 files, right?
I was wondering if I misunderstood something?
I can give you an example of the data I have
Sample 1_Biological replicate1: technical replicate1_R1
technical replicate1_R2
technical replicate2_R1
technical replicate2_R2
Sample 1_Biological replicate 2 : (has again 4 files ( 2 technical replicated paired-end))
Sample1_Biological replicate 3 : (has again 4 files ( 2 technical replicated paired-end))
I really look forward to your response and suggestions,
Best,
Shweta
@s.godbole I highly recommend studying our training material for transcriptomics.
You will find many different cases with replicates and background information
Dear @bjoern.gruening,
Many thanks! I will surely look into this
Thank you very much
Best,
Shweta
hey @gallardoalba @bjoern.gruening ,
I looked through all your tutorials and now I am a bit more confused actually. I was wondering if it would be better to build data pairs for R1 and R2 of my first technical replicate and similarly for 2nd technical replicate and then build a collection of them and merge the counts I get later
OR
I build a collection of R1 of both my technical replicates and another collection R2 of both technical replicates and then generate the count file after processing them this way?
Is there a right or wrong way of doing this?
Best,
Shweta