How to get log fold change for individual samples?

Hi,

I’m trying to analyses RNAseq data from 8 samples (4 Control and 4 treatment) using lime/DESeq2 and the information that is provided in the results output file:

ENTREZID ENSEMBL SYMBOL GENENAME logFC AveExpr t P.Value adj.P.Val B

These are the columns that are on the file… I expected to obtain results on the fold change of each of the treated samples in relation to the control sample. Could you please help me to get the fold change for individual samples?

Hi @Nidheesh_Thadathil

These tools do not output calculations for individual samples directly. Instead, these only run on replicate pools.

REF → FAQ: Extended Help for Differential Expression Analysis Tools

Differential expression tools all require sample count replicates. Rationale from two of the DEseq tool authors.

In this, they explain how to do this with R directly.

You can run R in Galaxy inside an interactive environment. Rstudio is one choice → GTN Materials Search


However, I’m wondering if Limma might provide alternative results that may interest you? This tutorial covers many of the options and seems worth a review if you are interested. The data you have now is probably an appropriate input already (but you can navigate to the prior tutorial, and the downstream one) → Hands-on: 2: RNA-seq counts to genes / Transcriptomics

Hope this helps! :slight_smile:

Thank you @jennaj

1 Like

I actually have the same question for limma-voom, since limma doesn’t output logFCs for individual samples, either. I’m doing a time course and trying to use ANOVA to compare fold changes across time points. Without individual sample logFCs I can’t really do that statistics.

Hi @Nidheesh_Thadathil

The people who wrote those tools do not think that individual samples are “enough” for these scientific statistics. Instead, the statistics are for different sample states, and replicate pools are required.

For a time series, that would mean two or more samples for each time point. Each time point will then have a metric describing relative expression differences versus the other time points.

The FAQ I linked above

has a quote about this that links to the forum where that discussion was originally posted. You can probably find more discussion about the logic at their own forum here directly. → https://support.bioconductor.org/. And you could ask them for clarification again if there is something new that is not covered.

I promise that I’m not pushing back! :slight_smile: I do think this comes up, and people are just not sure where the limitation is coming from… but I can let you know that this isn’t something special that is happening in Galaxy. These tools work the same wherever you are using them. Now, there are some things that can be done directly in R with a single replicate – including in a Galaxy R environment! – but those are not the full tools, and not these tools specifically. For instructions, the Bioconductor vignettes would be the best resource as far as I know.

Hope this helps, and if I misunderstood, you can clarify a bit more.

Thanks! By samples, I actually meant “replicates”. So I do have multiple replicates for each time point and condition. It’s just that the DE table limma outputs only gives an average logFC for all three replicates, not individual logFCs that it calculated its average from.

I learned that the t statistics in the table is just

(average logFC)/(standard error)

and I was able to calculate my error bar by

logFC/t

and put that in my Prism table.

So all in all, it’s kinda solved for me.

1 Like