List with zero data sets in Limma voom

shrutikane · August 9, 2024, 9:04am

I am having trouble with limma voom. the output folder shows a list with zero data sets. please if anybody can help.

what should be the next step. I need data urgently

jennaj · August 9, 2024, 7:13pm

Thanks for the great screenshot, very helpful!

Not including count replicates is what I can see as at least one problem right now.

If you opened up the Tool Standard Error (stderr) it will probably report that the tool is expecting “replicates” and didn’t find them (unless some other problem was encountered first!).

In practical use, this means two or more Factor levels with two or more count files each. So, a minimum of one Factor, two Factor levels, four Sample count files for a simple run. All of the differential expression tools written by Bioconductor require replicates.

We have some help about this and other common usage help in a GTN guide here → FAQ: Extended Help for Differential Expression Analysis Tools.

Quote from that guide

Differential expression tools all require sample count replicates. Rationale from two of the DEseq tool authors.

At least two factor levels/groups/conditions with two samples each.

All must all contain unique content for valid scientific results.

For definitive help, please refer to the tool vignette and publications (linked at the bottom of the Galaxy tool form), plus see our Limma Tutorials for more Galaxy examples, including sample data, methods, and a workflow template.

In short: usage is about the same between all platforms (R or Galaxy or a Notebook of any kind) since the underlying tool is the same everywhere… and we can help with the “translate to a Galaxy tool form” parts here at this forum.

It looks like you are following a tutorial of some sort but I don’t recognize it. You can refer the instructor to this post, and you or they are welcome to ask more questions. Please include a link to the tutorial as a reference, along with a shared history link (with at least the starting data run through upstream steps) if at all possible.

Hope this helps!

Xref EdgeR Tutorial for Differential Gene Expression - #2 by jennaj

shrutikane · August 10, 2024, 5:37am

Thankyou so much for a detailed analysis

this was the tutorial on Youtube.
according to my understanding i will have to input multiple files in a group for the factor i am testing for.
can you guide me for the shared history link. i will be happy to share it.

jennaj · August 12, 2024, 9:01pm

Hi @shrutikane

Yes – this is the minimum

I took a peek a this video at the Limma chapter, and it also uses multiple files per factor level, so all of this advice is matching up.

shrutikane · August 14, 2024, 3:22am

I have uploaded 3 files in control and i have 3 treated conditions therefore i have 3 files for each condition as the experiment is performed in triplicates.
I don’t have 4 count files. what should be done for this

jennaj · August 14, 2024, 4:20pm

Hi @shrutikane

It sounds like you have “enough” count files:

Organizing it like:

One Factor
Two Factors Levels
then add in your three Sample count files for each of those two Factor levels

Is that what you tried? In your latest screenshot, we can’t see enough of the Job Parameters → Input Parameter table to check.

And, did you click on the Job Information section to expand them yet? The black boxes in your screenshot are what I am talking about. Click on each of the these: Tool Standard Output (stdout) and the Tool Standard Error (stderr) boxes.

You can either post back new screenshots that include that content, or generate a share link to your history and post that back here. FAQ: Sharing your History (see the first section “1” for how to do this). The links in the banner at this forum also explain what we are looking at for this troubleshooting.

The data itself could potentially still have a problem but we can follow up more about that depending on the content I’m asking for above.

We can probably sort this out! Learning where to find and interpret this kind of job information will help you ongoing, with just about any computational work, when you are running them in Galaxy or otherwise.

shrutikane · August 15, 2024, 11:31am

this is the history share link
Just to confirm
1 factor- Fatty Acid Treatment
2 factor levels- Control, OA,PA,OA+PA
3. Triplicate for each file

job parameters file

jennaj · August 15, 2024, 6:28pm

Hi @shrutikane

Thanks for the screenshot, and for the shared history link.

These are your job run details

Comparing what was input in your screenshot, and that error message about the “Treated” value, I see one problem here: notice how the Contrast of Interest parameter was set as

Control-Treated

but your factor levels are named

Control and PA

For Contrast of Interest try using Control-PA instead to see if that is enough.

Please give that a try and let us know how it works! I didn’t notice anything else off at this top level yet but we can follow up.

shrutikane · August 16, 2024, 2:26am

This is what is shown

jennaj · August 16, 2024, 5:08pm

Hi @shrutikane

You are making progress on this! Would you be able to expand the third black box in your screenshot: Tool Standard Error. We can see part of the message but not all of it.

Searching a portion of what we can see at the forum for the Bioconductor tools (including Limma) find this → https://support.bioconductor.org/post/search/?query="names"+must+be+a+character+vector

And since the message also includes this

make.unique(geneanno[, 2])

it suggests that the second column of your annotation file, what is in your screenshot as the Gene Annotations, dataset 225, is where you will want to look next. How is that file formatted? You can click on the dataset to expand it. That will show the the first few lines (“peek view”), and can see the whole dataset using the eye icon.

Compare your file to the instructions on the tool form in the Help section.

Quote

Gene Annotations: Optional input for gene annotations, this can contain more information about the genes than just an ID number. The annotations will be available in the differential expression results table and the optional normalised counts table. They will also be used to generate interactive Glimma Volcano, MD plots and tables of differential expression. The input annotation file must contain a header row and have the gene IDs in the first column. The second column will be used to label the genes in the Volcano plot and interactive Glimma plots, additional columns will be available in the Glimma interactive table. The number of rows should match that of the counts files, add NA for any gene IDs with no annotation. The Galaxy tool annotateMyIDs can be used to obtain annotations for human, mouse, fly and zebrafish.

Example:

GeneID Symbol GeneName

11287 Pzp pregnancy zone protein

11298 Aanat arylalkylamine N-acetyltransferase

11302 Aatk apoptosis-associated tyrosine kinase

11303 Abca1 ATP-binding cassette, sub-family A (ABC1), member 1

11304 Abca4 ATP-binding cassette, sub-family A (ABC1), member 4

11305 Abca2 ATP-binding cassette, sub-family A (ABC1), member 2

So, the tool is expecting three columns of data.

But reviewing your settings when using the AnnotateMyIDs tool, it seems that you choose to output four columns of data.

This is a screenshot of your AnnotateMyIDs run’s job information view.

The tool will expect that the first column in the feautureCounts input files are the same as the first column in the annotateMyIDs files. For you, those are ENSEMBL gene identifers.

What to try next

Rerun the annotateMyIDs tool, and when choosing the output columns to generate, output just the three columns the next tool is expecting.

|GeneID|Symbol|GeneName|

For you, that will be

ENSEMBL SYMBOL GENENAME

Question: why not manipulate the current output (cut columns)?

Answer: both tools, Limma and annotateMyIDs, seem to be reporting that it found “duplicates” in the gene identifier listings. That is probably because of a one-to-many mapping between ENSEMBLE and ENTREZ identifiers.

So, check your output from the new run to make sure that isn’t reported again. If it is, you can run the tool again with the “Remove duplicates?” annotateMyIDs option set to YES.

Remove duplicates?

No

If this option is set to Yes, only the first occurrence of each input Gene ID will be kept. Default: No

This is a good example of different data being processed through a step in one of our GTN tutorials. If you want to compare, this is the direct link to the step. → Hands-on: 2: RNA-seq counts to genes / 2: RNA-seq counts to genes / Transcriptomics

Please give that a try and let us know how it works!

shrutikane · August 18, 2024, 6:04am

The correction in the annotation file has worked. Thankyou so much for your continuous guidance and help. Will be proceeding with further analysis now. Will put a request if required.
Thankyou

Topic		Replies	Views
DESeq2 for multiple groups usegalaxy.eu support deseq2 , limma_voom	2	64	February 20, 2025
Limma "Run Tool" Not Working usegalaxy.org support limma_voom	2	174	March 27, 2024
Error in Limma-voom Duplicate Row Error	1	1076	August 7, 2019
EdgeR Tutorial for Differential Gene Expression limma_voom	2	774	November 7, 2023
Error in DESeq2 output transcriptomics , edger , limma_voom	5	1312	March 30, 2021

GeneID	Symbol	GeneName
11287	Pzp	pregnancy zone protein
11298	Aanat	arylalkylamine N-acetyltransferase
11302	Aatk	apoptosis-associated tyrosine kinase
11303	Abca1	ATP-binding cassette, sub-family A (ABC1), member 1
11304	Abca4	ATP-binding cassette, sub-family A (ABC1), member 4
11305	Abca2	ATP-binding cassette, sub-family A (ABC1), member 2

List with zero data sets in Limma voom

Related topics