I have a question about the option: “(Optional) provide a tabular file with additional batch factors to include in the model.” I want to use “age” as continuous covariate in DESeq2 (as suggested by the DESeq2 author: DESeq2 with continuous variable) and I currently tried this uploading a table for my samples with “age” (centered and scaled) there to be included as covariate in the analysis. Is this the right way to do so?
Welcome, @Martin_Pook
Yes, probably (based on this other post by Mike Love).
Are you getting an error? We can help to troubleshoot any technical problems you might be having. Please see the banner at this forum for how to do that, or see the topic here directly How to get faster help with your question
Most of the errors we see in Galaxy are simple data input issues, so I’m adding the help in below in case that helps you or anyone else reading this later on. A very simple “time point” example is included in the table, and maybe helps?
The DESeq2 tool as implemented in Galaxy is the same base program as used anywhere else.
Reminders about how these Bioconductor tools expect data to be formatted → FAQ: Extended Help for Differential Expression Analysis Tools
And, an example of how to format the optional factor table is in the Help section on the Galaxy form (scroll down to find this), and part of it is quoted below.
Inputs
Count Files
DESeq2 takes count tables generated from featureCounts, HTSeq-count or StringTie as input. Count tables must be generated for each sample individually. One header row is assumed, but files with no header (e.g from HTSeq) can be input with the Files have header? option set to No. DESeq2 is capable of handling multiple factors that affect your experiment. The first factor you input is considered as the primary factor that affects gene expressions. Optionally, you can input one or more secondary factors that might influence your experiment. But the final output will be changes in genes due to primary factor in presence of secondary factors. Each factor has two levels/states. You need to select appropriate count table from your history for each factor level.
The following table gives some examples of factors and their levels:
Factor | Factor level 1 | Factor level 2 |
---|---|---|
Treatment | Treated | Untreated |
Condition | Knockdown | Wildtype |
TimePoint | Day4 | Day1 |
SeqType | SingleEnd | PairedEnd |
Gender | Female | Male |
Note: Output log2 fold changes are based on primary factor level 1 vs. factor level2. Here the order of factor levels is important. For example, for the factor ‘Treatment’ given in above table, DESeq2 computes fold changes of ‘Treated’ samples against ‘Untreated’, i.e. the values correspond to up or down regulations of genes in Treated samples.
Some additional advice on how to do this if you have multiple factors and factor levels…
-
Consider grouping the count files per factor level into collection folders. These can even be generated within collection folders. Or, you can process the data all in one collection with group tags. Another recent Q&A topic covers these details here → Error when running DeSeq2: help resources and input strategies - #4 by jennaj
-
Set the form up. Give each factor a distinct simple name (letters, numbers, underscores only, not starting with a number and no spaces) , and give each factor level a distinct simple name. Then select the appropriate count files for each factor level.
-
All of this could be put into a workflow, or you can consider extracting a workflow after for reuse. You may also be able to adapt one of the existing workflow templates.
-
DESeq2 expects individual count files, and does not accept a count matrix. Limma and EdgeR accept both.
Thank you!
This is encouraging and I also found meanwhile that one can confirm and see how the design model has been set looking from the Galaxy Tool Standard Output.