I am running EGSEA, and it was working fine until I started to run into errors.
I have the file in tabular format.
Geneid
FFSSCLCR3
FFSSCLCR2
FFSSCLCR1
iPSC1
iPSC2
iPSC3
2018
2867.300115
4384.063805
3697.321171
0
0
0
196047
2139.832269
3254.124667
2938.558938
0
0
3.544768899
27063
6174.085968
2953.349501
6630.032231
0
7.378716416
5.317153348
And have a factor as tabular format
Sample
Cell Type
FFSSCLCR3
FFSSCLC
FFSSCLCR2
FFSSCLC
FFSSCLCR1
FFSSCLC
iPSC1
iPSC
iPSC2
iPSC
iPSC3
iPSC
and my annotation as tabular as well
ENTREZID
SYMBOL
100287102
DDX11L1
653635
WASH7P
102466751
MIR6859-1
100302278
MIR1302-2
645520
FAM138A
79501
OR4F5
729737
LOC729737
102725121
DDX11L17
But… when I tried to run EGSEA, it says
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
EGSEA analysis has started
Log fold changes are estimated using limma package …
limma DE analysis is carried out …
EGSEA is running on the provided data and h collection
…globaltestcamera.ora*
EGSEA is running on the provided data and c5 collection
…cameraglobaltestora*
EGSEA is running on the provided data and gsdbgo collection
…globaltestcamera.ora*
EGSEA is running on the provided data and kegg collection
…globaltestcamera.ora*
EGSEA analysis took 12.34 seconds.
EGSEA analysis has completed
EGSEA HTML report is being generated …
Report pages and figures are being generated for the h collection …
Heat maps are being generated for top-ranked gene sets
based on logFC …
Error in plotHeatMapsLogFC(gene.sets = gsets.top, fc = logFC, limma.tops = limma.tops, :
All featureIDs in the gs.annot list should map to
a valid gene symbol
Calls: egsea.cnt → egsea
and tried to troubleshoot, but I gave up after three days of trying…
can anyone help me to figure out what I am doing wrong here?
The problem is reported in the error message here:
Tool form instructions:
Symbols Mapping file
A file containing the Gene Symbol for each Entrez Gene ID. The first column must be the Entrez Gene IDs and the second column must be the Gene Symbols. It is used for the heatmap visualization. The number of rows should match that of the Counts Matrix.
Checking that both datasets contain the same number of rows is a basic check, but you could also compare the IDs and make sure they are a match. You might also need to adjust the header lines in the files – these deviate from the example data.
Some tools have built-in expectations for how the data are labeled in headers that the Galaxy wrapper around the tool cannot auto-adjust. This can lead to spurious error messages. So, I’d recommend following the same header labeling as in the examples just to eliminate that from being a problem, too. The sample names can differ of course, but columns of data representing gene/symbol information could have standardized labels in the first row.
Using the most current version of any tool is also usually important, to capture bug fixes and wrapper enhancements, and to make further troubleshooting easier. If you are not working at a usegalaxy.* server, you might want to try the tool at one of those as a comparison. UseGalaxy.org and UseGalaxy.eu are the best choices – UseGalaxy.org.au usually follows the same server (but not cluster) configuration as UseGalaxy.eu so testing/comparing between those is not normally needed.
If the tool still fails after those adjustments are made/confirmed, please confirm:
The URL of the usegalaxy.* server where you tested.
The complete tool name and version – find this info at the top of the tool form.
Let’s start there. We may ask for a history share link to help more, and that can be posted back if you are not concerned about keeping the data private, or let us know if you do want to keep it private and a moderator will start a private message thread for sharing. If there is some bug with the tool, we can help to sort that out versus a usage issue.
Thank you for the reply. I ended up ditching the dataset. However, I will follow your recommendation and try to reanalyzed the data using the same header as the example.
I first thought that I was getting the errors because of NAs and LOC### and so on.
Hopefully, it is just the header.