Error when running tool: ModuleNotFoundError: No module named 'pandas'. While pandas is in requirements

When i try to run my (I am the developer) tool I get the following error:

Traceback (most recent call last):
File “/data/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/onnodg/cdhit_analysis/00d56396b32a/cdhit_analysis/cdhit_analysis.py”, line 8, in
import pandas as pd
ModuleNotFoundError: No module named ‘pandas’

I did mention pandas in the requirements. A different tool that i made, also using pandas does not run into this problem.
Here is a link to my tool: Galaxy | Tool Shed
Requirements in my xml file:

<requirements>
    <requirement type="package" version="3.12.3">python</requirement>
    <requirement type="package" version="3.10.6">matplotlib</requirement>
    <requirement type="package" version="2.3.2">pandas</requirement>
    <requirement type="package" version="3.1.5">openpyxl</requirement>
</requirements>
<command detect_errors="exit_code"><![CDATA[
python ‘$_tool_directory_/cdhit_analysis.py’

And here is a link to the tool that also used pandas but does not run into this error: Galaxy | Tool Shed

requirements in my xml file:

<requirements>
    <requirement type="package" version="3.12.3">python</requirement>
    <requirement type="package" version="3.10.6">matplotlib</requirement>
    <requirement type="package" version="2.3.2">pandas</requirement>
    <requirement type="package" version="2.3.2">numpy</requirement>
    <requirement type="package" version="3.1.5">openpyxl</requirement>
</requirements>
<command detect_errors="exit_code"><!\[CDATA\[
python ‘$_tool_directory_/blast_annotations_processor.py’

I have no clue why this error is occuring, I had no problem with the tool when testing and linting with planemo.

Note: I am running both of the tools on a private instance of Galaxy, but since only one tool runs into the error I dont think that is causing it.

Hi @Onnodg

Thanks for sharing all the details!

To clarify, when you run the tool test with Planemo with a dev version of Galaxy, it works, but on your server, it doesn’t, correct? Was your test with the same release of Galaxy as your server?

For something simple, you could check the actual production runtime environment with something like

conda list -p | grep pandas

The tool xml seems fine (unless you also need to declare numpy?) so you’ll need to examine the server’s job runtime environment closer.

Let’s start there, thanks! :slight_smile:

Hi @jennaj

Thank you for your response. The testing environment should be the same as on the Galaxy server. I dont have shell acces and updating my tools would require someone in my organisation to install the new revision to our Galaxy server, so debugging can take some days. I did find something i think might be the cause of the error, I left a faulty shebang in the tool that causes the error. Which explains why it works in my local environment and not on the server.

I will test this and report back on it.

Kind regards,
Onno

1 Like

Hi @jennaj

I added this sanity check to the tool:

#!/bin/bash

SCRIPTDIR=$(dirname "$(readlink -f "$0")")
python $SCRIPTDIR"/cdhit_analysis.py" "$@"

# sanity check
printf "Conda env: %s\n" "$CONDA_DEFAULT_ENV"
printf "Python version: %s\n" "$(python --version | awk '{print $2}')"
printf "Matplotlib version: %s\n" "$(python -c 'import matplotlib; print(matplotlib.__version__)')"
printf "Pandas version: %s\n" "$(python -c 'import pandas; print(pandas.__version__)')"
printf "Openpyxl version: %s\n" "$(python -c 'import openpyxl; print(openpyxl.__version__)')"
printf "Bash version: %s\n" "${BASH_VERSION}"
printf "SCRIPTDIR: %s\n\n" "$SCRIPTDIR"

Here is the stdout when i run the tool on planemo

Processing complete. Processed 24 clusters.
Conda env: mulled-v1-6ade70589d4d1d48b3cac63fe89463272158608d42a29feab5a9df7545b7aed8
Python version: 3.12.3
Matplotlib version: 3.10.6
Pandas version: 2.3.2
Openpyxl version: 3.1.5
Bash version: 5.2.21(1)-release
SCRIPTDIR: {path to tool on my laptop}

And here when i run the tool on galaxy

Conda env: mulled-v1-6ade70589d4d1d48b3cac63fe89463272158608d42a29feab5a9df7545b7aed8
Python version: 3.10.12
Matplotlib version: 
Pandas version: 
Openpyxl version: 
Bash version: 5.1.16(1)-release
SCRIPTDIR: /data/galaxy/var/shed_tools/toolshed.g2.bx.psu.edu/repos/onnodg/cdhit_analysis/c6981ea453ae/cdhit_analysis

My Galaxy instance runs on Galaxy version 24.1.5.dev0

with planemo the test uses Galaxy version v25.0.3-2116-g0420a1da7d

Is this helpful?

Hi @Onnodg

Glad to see that you are making progress!

It looks to me like your server’s job runtime environment is not permitting any of the import commands to actually install the dependencies.

Try using Conda (or Bioconda) instead of Python for the dependency resolution. This is good for you but also for anyone else using the tool (and good for your cluster administrators!). I’m not sure where you sourced the mulled environment but you might be able to find one that includes all of the dependencies already and you could reuse it. Or, possibly create a new one for your tool(s)?

The tutorials here cover these details, and you can use Planemo to create and test. I would have expected Planemo to have already presented with a message about this but maybe I’m thinking of an older version or you are not using it yet for this part (the runtime environment configuration?).

Also, I’m sure you know this but .. some of these newer dependency releases might have complications when used together. Meaning, they might produce different scientific results when combined in certain ways. I noticed a few of these reported at the matplotlib and pandas websites (in the release notes) when I was reviewing for obviously known conflicts originally a few weeks ago but you will know your function calls more than I will, so be sure to review those, choose a combinations, then specify that in your environment.

Controlling for all dependencies/versions is pretty common. This ensures that no matter where that tool is executed the “same” result comes out (excluding some minor platform math differences). The guide above about containers discusses this more plus how to add in some tests to make sure everything is set up as expected (distinct from the Galaxy wrapper itself or those tool-tests).

Hope this helps! You are getting closer! :slight_smile:

Hi @jennaj

The galaxy admins of my workplace updated to a new Galaxy version yesterday. They also installed a newer Conda version, after which all tool dependencies and environments were rebuilt. This seems to have fixed my problem.

Thanks alot for all your help!

Best,
Onno

1 Like

Oh great @Onnodg! I’m really glad this worked out for you. Getting the dependency situation organized is a big step to having a stable server environment. Thanks for letting us know! :rocket: