DEseq2 understanding the log2 fold change

Welcome, @marymusc

Short help

  • Between Factors, the first factor is the baseline and what other factors are compared to.
  • Within a Factor: the first factor level is the baseline and what other factor levels are compared to.
  • Therefore: negative means down-regulated, positive means up-regulated.

You should be able to flip the ordering to get (roughly) the same result but with a different positive/negative notation.

TL;DR help :scientist:

The Bioconductor forum has many more details directly from the authors. This is one example search, and you can try more. https://support.bioconductor.org/post/search/?query=deseq2+first+factor

For how to model this on the Galaxy form, let’s start with the Help section on the tool form (scroll down to find this).

Inputs

Count Files

DESeq2 takes count tables generated from featureCounts, HTSeq-count or StringTie as input. Count tables must be generated for each sample individually. One header row is assumed, but files with no header (e.g from HTSeq) can be input with the Files have header? option set to No. DESeq2 is capable of handling multiple factors that affect your experiment. The first factor you input is considered as the primary factor that affects gene expressions. Optionally, you can input one or more secondary factors that might influence your experiment. But the final output will be changes in genes due to primary factor in presence of secondary factors. Each factor has two levels/states. You need to select appropriate count table from your history for each factor level.

The following table gives some examples of factors and their levels:

Factor Factor level 1 Factor level 2
Treatment Treated Untreated
Condition Knockdown Wildtype
TimePoint Day4 Day1
SeqType SingleEnd PairedEnd
Gender Female Male

Note: Output log2 fold changes are based on primary factor level 1 vs. factor level2. Here the order of factor levels is important. For example, for the factor ā€˜Treatment’ given in above table, DESeq2 computes fold changes of ā€˜Treated’ samples against ā€˜Untreated’, i.e. the values correspond to up or down regulations of genes in Treated samples.


Pracical help

To see how this works with example data: run through a tutorial that uses the tool, and play around with that reduced data to test out how different results are generated. Find links at the bottom of the form to the GTN tutorials that include the tool.

You can load up the input data and the workflow, launch that quickly, then come back later to explore. You don’t need to do all the clicking around if you are just interested in the result exploration parts.

Then maybe flip the factor ordering and rerun parts to see what the differences are versus what would be expected (an ā€œoppositeā€ result). This is easier to do with a ā€œknownā€ curated and representative test data bundle, and can help you to understand what is going on in your ā€œrealā€ analysis that may have more complexities to sort out (newly discovered confounders, and similar). Review those Bioconductor posts, and maybe the tool Vignette. Skim though the direct usage parts and instead focus on the scientific logic from the authors for the different use cases. If you find a usage that you cannot translate to the Galaxy form, we can try to help more with that here.

Let’s start there, and let us know how this works out for you. :slight_smile: