DEseq2 understanding the log2 fold change

jennaj · September 25, 2024, 5:53pm

Short help

Between Factors, the first factor is the baseline and what other factors are compared to.
Within a Factor: the first factor level is the baseline and what other factor levels are compared to.
Therefore: negative means down-regulated, positive means up-regulated.

You should be able to flip the ordering to get (roughly) the same result but with a different positive/negative notation.

TL;DR help

The Bioconductor forum has many more details directly from the authors. This is one example search, and you can try more. https://support.bioconductor.org/post/search/?query=deseq2+first+factor

For how to model this on the Galaxy form, let’s start with the Help section on the tool form (scroll down to find this).

Inputs

Count Files

DESeq2 takes count tables generated from featureCounts, HTSeq-count or StringTie as input. Count tables must be generated for each sample individually. One header row is assumed, but files with no header (e.g from HTSeq) can be input with the Files have header? option set to No. DESeq2 is capable of handling multiple factors that affect your experiment. The first factor you input is considered as the primary factor that affects gene expressions. Optionally, you can input one or more secondary factors that might influence your experiment. But the final output will be changes in genes due to primary factor in presence of secondary factors. Each factor has two levels/states. You need to select appropriate count table from your history for each factor level.

The following table gives some examples of factors and their levels:

Factor	Factor level 1	Factor level 2
Treatment	Treated	Untreated
Condition	Knockdown	Wildtype
TimePoint	Day4	Day1
SeqType	SingleEnd	PairedEnd
Gender	Female	Male

Note: Output log2 fold changes are based on primary factor level 1 vs. factor level2. Here the order of factor levels is important. For example, for the factor ‘Treatment’ given in above table, DESeq2 computes fold changes of ‘Treated’ samples against ‘Untreated’, i.e. the values correspond to up or down regulations of genes in Treated samples.

Pracical help

To see how this works with example data: run through a tutorial that uses the tool, and play around with that reduced data to test out how different results are generated. Find links at the bottom of the form to the GTN tutorials that include the tool.

You can load up the input data and the workflow, launch that quickly, then come back later to explore. You don’t need to do all the clicking around if you are just interested in the result exploration parts.

Then maybe flip the factor ordering and rerun parts to see what the differences are versus what would be expected (an “opposite” result). This is easier to do with a “known” curated and representative test data bundle, and can help you to understand what is going on in your “real” analysis that may have more complexities to sort out (newly discovered confounders, and similar). Review those Bioconductor posts, and maybe the tool Vignette. Skim though the direct usage parts and instead focus on the scientific logic from the authors for the different use cases. If you find a usage that you cannot translate to the Galaxy form, we can try to help more with that here.

Let’s start there, and let us know how this works out for you.