I am having trouble with limma voom. the output folder shows a list with zero data sets. please if anybody can help.
what should be the next step. I need data urgently
I am having trouble with limma voom. the output folder shows a list with zero data sets. please if anybody can help.
Welcome, @shrutikane
Thanks for the great screenshot, very helpful!
Not including count replicates is what I can see as at least one problem right now.
If you opened up the Tool Standard Error (stderr) it will probably report that the tool is expecting “replicates” and didn’t find them (unless some other problem was encountered first!).
In practical use, this means two or more Factor levels with two or more count files each. So, a minimum of one Factor, two Factor levels, four Sample count files for a simple run. All of the differential expression tools written by Bioconductor require replicates.
We have some help about this and other common usage help in a GTN guide here → FAQ: Extended Help for Differential Expression Analysis Tools.
Quote from that guide
- Differential expression tools all require sample count replicates. Rationale from two of the DEseq tool authors.
- At least two factor levels/groups/conditions with two samples each.
- All must all contain unique content for valid scientific results.
For definitive help, please refer to the tool vignette and publications (linked at the bottom of the Galaxy tool form), plus see our Limma Tutorials for more Galaxy examples, including sample data, methods, and a workflow template.
In short: usage is about the same between all platforms (R or Galaxy or a Notebook of any kind) since the underlying tool is the same everywhere… and we can help with the “translate to a Galaxy tool form” parts here at this forum.
It looks like you are following a tutorial of some sort but I don’t recognize it. You can refer the instructor to this post, and you or they are welcome to ask more questions. Please include a link to the tutorial as a reference, along with a shared history link (with at least the starting data run through upstream steps) if at all possible.
Hope this helps!
Xref EdgeR Tutorial for Differential Gene Expression - #2 by jennaj
Thankyou so much for a detailed analysis
this was the tutorial on Youtube.
according to my understanding i will have to input multiple files in a group for the factor i am testing for.
can you guide me for the shared history link. i will be happy to share it.
Hi @shrutikane
Yes – this is the minimum
I took a peek a this video at the Limma chapter, and it also uses multiple files per factor level, so all of this advice is matching up.
Hi @shrutikane
It sounds like you have “enough” count files:
Organizing it like:
Is that what you tried? In your latest screenshot, we can’t see enough of the Job Parameters → Input Parameter table to check.
And, did you click on the Job Information section to expand them yet? The black boxes in your screenshot are what I am talking about. Click on each of the these: Tool Standard Output (stdout) and the Tool Standard Error (stderr) boxes.
You can either post back new screenshots that include that content, or generate a share link to your history and post that back here. FAQ: Sharing your History (see the first section “1” for how to do this). The links in the banner at this forum also explain what we are looking at for this troubleshooting.
The data itself could potentially still have a problem but we can follow up more about that depending on the content I’m asking for above.
We can probably sort this out! Learning where to find and interpret this kind of job information will help you ongoing, with just about any computational work, when you are running them in Galaxy or otherwise.
this is the history share link
Just to confirm
1 factor- Fatty Acid Treatment
2 factor levels- Control, OA,PA,OA+PA
3. Triplicate for each file
Hi @shrutikane
Thanks for the screenshot, and for the shared history link.
These are your job run details
Comparing what was input in your screenshot, and that error message about the “Treated” value, I see one problem here: notice how the Contrast of Interest parameter was set as
Control-Treated
but your factor levels are named
Control and PA
For Contrast of Interest try using Control-PA instead to see if that is enough.
Please give that a try and let us know how it works! I didn’t notice anything else off at this top level yet but we can follow up.
Hi @shrutikane
You are making progress on this! Would you be able to expand the third black box in your screenshot: Tool Standard Error. We can see part of the message but not all of it.
Searching a portion of what we can see at the forum for the Bioconductor tools (including Limma) find this → https://support.bioconductor.org/post/search/?query="names"+must+be+a+character+vector
And since the message also includes this
make.unique(geneanno[, 2])
it suggests that the second column of your annotation file, what is in your screenshot as the Gene Annotations, dataset 225, is where you will want to look next. How is that file formatted? You can click on the dataset to expand it. That will show the the first few lines (“peek view”), and can see the whole dataset using the eye icon.
Compare your file to the instructions on the tool form in the Help section.
Quote
Gene Annotations: Optional input for gene annotations, this can contain more information about the genes than just an ID number. The annotations will be available in the differential expression results table and the optional normalised counts table. They will also be used to generate interactive Glimma Volcano, MD plots and tables of differential expression. The input annotation file must contain a header row and have the gene IDs in the first column. The second column will be used to label the genes in the Volcano plot and interactive Glimma plots, additional columns will be available in the Glimma interactive table. The number of rows should match that of the counts files, add NA for any gene IDs with no annotation. The Galaxy tool annotateMyIDs can be used to obtain annotations for human, mouse, fly and zebrafish.
Example:
GeneID Symbol GeneName 11287 Pzp pregnancy zone protein 11298 Aanat arylalkylamine N-acetyltransferase 11302 Aatk apoptosis-associated tyrosine kinase 11303 Abca1 ATP-binding cassette, sub-family A (ABC1), member 1 11304 Abca4 ATP-binding cassette, sub-family A (ABC1), member 4 11305 Abca2 ATP-binding cassette, sub-family A (ABC1), member 2
So, the tool is expecting three columns of data.
But reviewing your settings when using the AnnotateMyIDs tool, it seems that you choose to output four columns of data.
This is a screenshot of your AnnotateMyIDs run’s job information view.
The tool will expect that the first column in the feautureCounts input files are the same as the first column in the annotateMyIDs files. For you, those are ENSEMBL gene identifers.
What to try next
Rerun the annotateMyIDs tool, and when choosing the output columns to generate, output just the three columns the next tool is expecting.
|GeneID|Symbol|GeneName|
For you, that will be
ENSEMBL SYMBOL GENENAME
Question: why not manipulate the current output (cut columns)?
Answer: both tools, Limma and annotateMyIDs, seem to be reporting that it found “duplicates” in the gene identifier listings. That is probably because of a one-to-many mapping between ENSEMBLE and ENTREZ identifiers.
So, check your output from the new run to make sure that isn’t reported again. If it is, you can run the tool again with the “Remove duplicates?” annotateMyIDs option set to YES.
Remove duplicates?
No
If this option is set to Yes, only the first occurrence of each input Gene ID will be kept. Default: No
This is a good example of different data being processed through a step in one of our GTN tutorials. If you want to compare, this is the direct link to the step. → Hands-on: 2: RNA-seq counts to genes / 2: RNA-seq counts to genes / Transcriptomics
Please give that a try and let us know how it works!
The correction in the annotation file has worked. Thankyou so much for your continuous guidance and help. Will be proceeding with further analysis now. Will put a request if required.
Thankyou