EGSEA help - Error in plotHeatMapsLogFC

incho · April 16, 2021, 6:49pm

Hello guys,

I am running EGSEA, and it was working fine until I started to run into errors.

I have the file in tabular format.

Geneid	FFSSCLCR3	FFSSCLCR2	FFSSCLCR1	iPSC2	iPSC3
2018	2867.300115	4384.063805	3697.321171	0	0
196047	2139.832269	3254.124667	2938.558938	0	3.544768899
27063	6174.085968	2953.349501	6630.032231	7.378716416	5.317153348

And have a factor as tabular format

Sample	Cell Type
FFSSCLCR3	FFSSCLC
FFSSCLCR2	FFSSCLC
FFSSCLCR1	FFSSCLC
iPSC1	iPSC
iPSC2	iPSC
iPSC3	iPSC

and my annotation as tabular as well

ENTREZID	SYMBOL
100287102	DDX11L1
653635	WASH7P
102466751	MIR6859-1
100302278	MIR1302-2
645520	FAM138A
79501	OR4F5
729737	LOC729737
102725121	DDX11L17

But… when I tried to run EGSEA, it says
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
EGSEA analysis has started
Log fold changes are estimated using limma package …
limma DE analysis is carried out …
EGSEA is running on the provided data and h collection
…globaltestcamera.ora*
EGSEA is running on the provided data and c5 collection
…cameraglobaltestora*
EGSEA is running on the provided data and gsdbgo collection
…globaltestcamera.ora*
EGSEA is running on the provided data and kegg collection
…globaltestcamera.ora*
EGSEA analysis took 12.34 seconds.
EGSEA analysis has completed
EGSEA HTML report is being generated …
Report pages and figures are being generated for the h collection …
Heat maps are being generated for top-ranked gene sets
based on logFC …
Error in plotHeatMapsLogFC(gene.sets = gsets.top, fc = logFC, limma.tops = limma.tops, :
All featureIDs in the gs.annot list should map to
a valid gene symbol
Calls: egsea.cnt → egsea

and tried to troubleshoot, but I gave up after three days of trying…
can anyone help me to figure out what I am doing wrong here?

Thank you in advance

incho · April 22, 2021, 1:30am

Hello guys,

Could it be because I am using hg38?
Should I try with hg19 and run EGSEA again?

I tried to remove all “NA,” “LOC###,” and ‘XXX-AS’ genes from my list. It still doesn’t run, and I am running out of idea how to make it run.

Thank you,

Filip_Filipsky · May 17, 2021, 4:57pm

I am experiencing same problem. Did you manage to sort it?

jennaj · May 17, 2021, 7:04pm

@Filip_Filipsky @incho

The problem is reported in the error message here:

Tool form instructions:

Symbols Mapping file

A file containing the Gene Symbol for each Entrez Gene ID. The first column must be the Entrez Gene IDs and the second column must be the Gene Symbols. It is used for the heatmap visualization. The number of rows should match that of the Counts Matrix.

Checking that both datasets contain the same number of rows is a basic check, but you could also compare the IDs and make sure they are a match. You might also need to adjust the header lines in the files – these deviate from the example data.

Some tools have built-in expectations for how the data are labeled in headers that the Galaxy wrapper around the tool cannot auto-adjust. This can lead to spurious error messages. So, I’d recommend following the same header labeling as in the examples just to eliminate that from being a problem, too. The sample names can differ of course, but columns of data representing gene/symbol information could have standardized labels in the first row.

Using the most current version of any tool is also usually important, to capture bug fixes and wrapper enhancements, and to make further troubleshooting easier. If you are not working at a usegalaxy.* server, you might want to try the tool at one of those as a comparison. UseGalaxy.org and UseGalaxy.eu are the best choices – UseGalaxy.org.au usually follows the same server (but not cluster) configuration as UseGalaxy.eu so testing/comparing between those is not normally needed.

If the tool still fails after those adjustments are made/confirmed, please confirm:

The URL of the usegalaxy.* server where you tested.
The complete tool name and version – find this info at the top of the tool form.

Let’s start there. We may ask for a history share link to help more, and that can be posted back if you are not concerned about keeping the data private, or let us know if you do want to keep it private and a moderator will start a private message thread for sharing. If there is some bug with the tool, we can help to sort that out versus a usage issue.

incho · May 19, 2021, 6:54pm

Jennaj,

Thank you for the reply. I ended up ditching the dataset. However, I will follow your recommendation and try to reanalyzed the data using the same header as the example.
I first thought that I was getting the errors because of NAs and LOC### and so on.
Hopefully, it is just the header.

I will update the thread for @Filip_Filipsky too.

Thanks,