When i try to run my (I am the developer) tool I get the following error:
Traceback (most recent call last):
File “/data/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/onnodg/cdhit_analysis/00d56396b32a/cdhit_analysis/cdhit_analysis.py”, line 8, in
import pandas as pd
ModuleNotFoundError: No module named ‘pandas’
I did mention pandas in the requirements. A different tool that i made, also using pandas does not run into this problem.
Here is a link to my tool: Galaxy | Tool Shed
Requirements in my xml file:
To clarify, when you run the tool test with Planemo with a dev version of Galaxy, it works, but on your server, it doesn’t, correct? Was your test with the same release of Galaxy as your server?
Thank you for your response. The testing environment should be the same as on the Galaxy server. I dont have shell acces and updating my tools would require someone in my organisation to install the new revision to our Galaxy server, so debugging can take some days. I did find something i think might be the cause of the error, I left a faulty shebang in the tool that causes the error. Which explains why it works in my local environment and not on the server.
It looks to me like your server’s job runtime environment is not permitting any of the import commands to actually install the dependencies.
Try using Conda (or Bioconda) instead of Python for the dependency resolution. This is good for you but also for anyone else using the tool (and good for your cluster administrators!). I’m not sure where you sourced the mulled environment but you might be able to find one that includes all of the dependencies already and you could reuse it. Or, possibly create a new one for your tool(s)?
The tutorials here cover these details, and you can use Planemo to create and test. I would have expected Planemo to have already presented with a message about this but maybe I’m thinking of an older version or you are not using it yet for this part (the runtime environment configuration?).
Also, I’m sure you know this but .. some of these newer dependency releases might have complications when used together. Meaning, they might produce different scientific results when combined in certain ways. I noticed a few of these reported at the matplotlib and pandas websites (in the release notes) when I was reviewing for obviously known conflicts originally a few weeks ago but you will know your function calls more than I will, so be sure to review those, choose a combinations, then specify that in your environment.
Controlling for all dependencies/versions is pretty common. This ensures that no matter where that tool is executed the “same” result comes out (excluding some minor platform math differences). The guide above about containers discusses this more plus how to add in some tests to make sure everything is set up as expected (distinct from the Galaxy wrapper itself or those tool-tests).
The galaxy admins of my workplace updated to a new Galaxy version yesterday. They also installed a newer Conda version, after which all tool dependencies and environments were rebuilt. This seems to have fixed my problem.
Oh great @Onnodg! I’m really glad this worked out for you. Getting the dependency situation organized is a big step to having a stable server environment. Thanks for letting us know!