Hi all,
After a power outage, I had some troubles restarting my galaxy instance using Ansible and have encountered some issues :
1. Gunicorn socket (gunicorn.sock
) not created
Setup
- Galaxy is run via
galaxyctl
- The systemd unit
galaxy-gunicorn.service
calls:
ExecStart=/home/galaxy/galaxy/.venv/bin/galaxyctl --config-file /home/galaxy/galaxy/config/galaxy.yml exec _default_ gunicorn
- In my
galaxy.yml
, I’ve defined:
gunicorn:
bind: unix:/home/galaxy/galaxy/config/gunicorn.sock
The issue
Despite Galaxy reporting as running (galaxyctl status
shows active (running)
), the expected socket file is not created:
$ sudo ls -l /home/galaxy/galaxy/config/gunicorn.sock
ls: cannot access: No such file or directory
Running lsof -U
shows Gunicorn has active UNIX streams, but no named socket bound on the filesystem:
gunicorn 18361 galaxy 1u unix 0xffff98cc56883300 0t0 type=STREAM
...
What I’ve checked
- File permissions are correct (
galaxy
can write toconfig/
). - Socket path is valid and matches what is defined in
galaxy.yml
. - No obvious errors in
journalctl
logs or stdout fromgalaxyctl
. - I have also followed the missing gunicorn.sock file with
systemctl daemon-reload
systemctl restart gunicorn.service
2. Missing dependency_resolvers_conf.xml
I’m also seeing this in the logs at startup:
galaxyctl[259655]: galaxy.tool_util.deps DEBUG 025-07-09 12:35:41,943 [pN:main,p:259655,tN:MainThread] Unable to find config file '/home/galaxy/galaxy/config/dependency_resolvers_conf.xml'
That would be fine (the file is optional), but immediately after this, the following error occurs:
galaxy.util.filelock.FileLockException: Timeout occurred.
Exception: Failed to get file lock for /home/galaxy/tool_dependencies/conda
This seems to come from:
with FileLock(..., timeout=300):
So, even though dependency_resolvers_conf.xml
is optional, its absence may be triggering Galaxy to fall back to conda
, which in turn tries to acquire a lock on /home/galaxy/tool_dependencies/conda.lock
— and fails because the file already exists (left behind?) or another process is stuck.
Questions
- Has anyone seen
gunicorn.sock
not created despite a validbind
path and no startup error? - Should we be explicitly creating
dependency_resolvers_conf.xml
even though it’s optional? - Can a failed Conda lock prevent Gunicorn from binding or finalizing startup?
Thanks in advance for any insights you can offer! I’m happy to provide logs, config excerpts, or run debugging commands if helpful.
Best,
— Naïra