limma-voom multiple factors & groups

Hi Galaxy Community,

I am trying to use the limma-voom tool and I have been following the “Genes to Counts” training. Within the training it gives the tip about “multiple factors” but the tip does not give an example of how the input would change when entering the “contrast of interest”. I have tried various different inputs to try and figure out on my own but when I do, I always get an error and from the errors it seems that the program will not read pass the 2nd column of the either the factor file or when inputted manually.

Here is what my factor file looks like:

What I am trying to do here is take all of the samples from the “Reaction” factor that are in group “A” and contrast the “Treatment” factor of groups “T-P”. I continue to receive an error stating that object “P” cannot be found which leads me to believe that the program is only reading the 1st column (sample IDs) and the 2nd column (the “Reaction” factor. I tried entering the multiple factors and groups manually but either way on the first factor and groups are accessed.

Can someone please assist me with either how to input the “contrast of interest” to account for multiple factors and groups or can some explain it to me differently (maybe I am missing something here)?

1 Like

Hi @montgopw

Correct, the contrasts string can only contain one - per grouping. Why? Those are interpreted.

You could input two different lines, or two different blocks on the form. Or create a compound term with parenthesis. The form has examples of this right under each input area, plus a link to the Bioconductor manual (these terms are passed directly to the underlying tool, so the formatting is identical).

And, maybe your example was simplified … but is there a reason why all treatment groups couldn’t have the same designation for this run? Meaning, the second column of your file could have just T for all. That would simplify your input. If you are using collections with group tags, it would be pretty quick to relabel the group tags for a collection that way (See Hands-on: Group tags for complex experimental designs / Using Galaxy and Managing your Data).

Let me know if I am misunderstanding. And, I’m guessing that you have seen this simple example already, but I’m linking it in for others reading → Hands-on: 2: RNA-seq counts to genes / Transcriptomics

Thank you, this has helped solve the issue.

I do however, have an additional question. When creating the factor file, how do you incorporate your groups in this file? It is my understanding that the 1st column is to be your samples (matching the count file), and the second column is to be your primary factor and any additional columns will count as additional factors. How do I include the names of the groups for the factors in the factor file?

Hi @montgopw

Hopefully this helps but let me know! I’m writing this out as a reference in case others have the same questions. It is a bit confusing!

Factor

  • This is the top level on the form.
  • You can have just one or several.
  • Some Bioconductor tools alternatively name this as a Condition.
  • Limma names it as a Factor.

Group

  • This is the second level on the form.
  • These represent sub-groups under a Factor, and two or more are required.
  • Some Bioconductor tools alternatively name this as a Factor level.
  • Limma names this a Group.

Factor file

This is the example on the form

Sample Genotype Batch
WT1 WT b1
WT2 WT b2
WT3 WT b3
Mut1 Mut b1
Mut2 Mut b2
Mut3 Mut b3

So, another way to write this could be

Sample FactorA FactorB
Sample1 Group1 Group3
Sample2 Group1 Group4
Sample3 Group1 Group5
Sample4 Group2 Group3
Sample5 Group2 Group4
Sample6 Group2 Group5

The example in this tutorial is another great reference → Hands-on: 2: RNA-seq counts to genes / 2: RNA-seq counts to genes / Transcriptomics

I think I did that right :slight_smile: but let me know if you have any questions.

Yes, thank you for this clarification. What I am attempting to do is use the limma-voom tool and I am wanting to create a excel factor file. I have gone over the “RNA-seq counts to genes” tutorial, but what it does not explain is how do I include the groups in this file.

For example, if you use the snippet of the factor file I provided earlier, you see the 1st column are the samples, the 2nd “factor 1”, and the 3rd
“factor 2”. I would like to include the groups “group1” & “group2”.

I am I following your explanation correctly as to do this the factor becomes the title header followed by the group to achieve this?

Yes. Then the tool is interpreting the table of data to sort out the count files, and constructs the command line that runs on the clusters.

Please give that a try :slight_smile: