DEseq2 understanding the log2 fold change

Hi everyone, I am having some trouble interpreting my results for DEseq2 and any help would be appreciated.
I am comparing 3 treatment groups: V (vehicle), A (treatment A), A + B (treatment A + B). I am independently comparing each group against each other.
When I entered my groups into Galaxy, I put the “A” group as my Factor 1 and “A+B” group as my Factor 2. Does this mean that the log 2 fold change values correspond to what is up/ down regulated in my treatment “A” group compared to the “A+B” group?
So, for example, if I performed DEseq this way, and I got -2 log fold change for a gene, that would mean that that gene was down regulated by 2 log fold in my “A+B” group or upregulated by 2 log fold change in my “A” group?

Welcome, @marymusc

Short help

  • Between Factors, the first factor is the baseline and what other factors are compared to.
  • Within a Factor: the first factor level is the baseline and what other factor levels are compared to.
  • Therefore: negative means down-regulated, positive means up-regulated.

You should be able to flip the ordering to get (roughly) the same result but with a different positive/negative notation.

TL;DR help :scientist:

The Bioconductor forum has many more details directly from the authors. This is one example search, and you can try more. https://support.bioconductor.org/post/search/?query=deseq2+first+factor

For how to model this on the Galaxy form, let’s start with the Help section on the tool form (scroll down to find this).

Inputs

Count Files

DESeq2 takes count tables generated from featureCounts, HTSeq-count or StringTie as input. Count tables must be generated for each sample individually. One header row is assumed, but files with no header (e.g from HTSeq) can be input with the Files have header? option set to No. DESeq2 is capable of handling multiple factors that affect your experiment. The first factor you input is considered as the primary factor that affects gene expressions. Optionally, you can input one or more secondary factors that might influence your experiment. But the final output will be changes in genes due to primary factor in presence of secondary factors. Each factor has two levels/states. You need to select appropriate count table from your history for each factor level.

The following table gives some examples of factors and their levels:

Factor Factor level 1 Factor level 2
Treatment Treated Untreated
Condition Knockdown Wildtype
TimePoint Day4 Day1
SeqType SingleEnd PairedEnd
Gender Female Male

Note: Output log2 fold changes are based on primary factor level 1 vs. factor level2. Here the order of factor levels is important. For example, for the factor ‘Treatment’ given in above table, DESeq2 computes fold changes of ‘Treated’ samples against ‘Untreated’, i.e. the values correspond to up or down regulations of genes in Treated samples.


Pracical help

To see how this works with example data: run through a tutorial that uses the tool, and play around with that reduced data to test out how different results are generated. Find links at the bottom of the form to the GTN tutorials that include the tool.

You can load up the input data and the workflow, launch that quickly, then come back later to explore. You don’t need to do all the clicking around if you are just interested in the result exploration parts.

Then maybe flip the factor ordering and rerun parts to see what the differences are versus what would be expected (an “opposite” result). This is easier to do with a “known” curated and representative test data bundle, and can help you to understand what is going on in your “real” analysis that may have more complexities to sort out (newly discovered confounders, and similar). Review those Bioconductor posts, and maybe the tool Vignette. Skim though the direct usage parts and instead focus on the scientific logic from the authors for the different use cases. If you find a usage that you cannot translate to the Galaxy form, we can try to help more with that here.

Let’s start there, and let us know how this works out for you. :slight_smile:

1 Like

Thank you so much for your reply! I think I understand now, but just to confirm please look at the picture attached if you have time. My first factor level/ group is “JZL + AM281” so this sets the baseline. My second factor level/ group is “JZL” so this is treatment group that is being compared to the baseline. Therefore, the negative and positive values refer to up/ down regulated genes in the second factor level group “JZL”, is this correct?

Hi @marymusc

I would say yes if this was expanded to

Therefore, the negative and positive values refer to up/ down regulated genes in the second factor level group “JZL” in the presence of the first factor level group “JZL + AM281”.

You first factor level counts are the baseline, and the changes reported are for the second factor level counts relative to that baseline.

The wording may seem a bit tedious, but all expression results are “relative” not absolute. This is part of why the scientific organization for these sorts of analysis jobs are so particular.

You know your own data best … but are you sure that you want this organization, and not the reverse? I’m just going from how you are labeling your data. You don’t have to answer so it is just something to think about and maybe another run through with different ordering will help you to decide. Then, maybe consult a statistician if you are not sure what to do next.

Getting the jobs to run is the first step! Glad you have that working now :scientist:

Hope this helps! :slight_smile: