Ok, here’s how the bug happens:
- Galaxy EU launches a job
- It runs on a cluster somewhere
- A This job contacts https://usegalaxy.eu/apollo_api/
- B That server passes the request to https://apollo.internal/apollo_api/
- Which passes it on to http://localhost:8080/apollo/
- The creation script crashes
- And never reaches the step to add permissions for your user to access the organism.
This request takes more than 60 seconds, which means proxy B says the request timed out, and proxy A also fails it, which ends up back in the job.
We (@abretaud) and I proxy the request to /organism/findAllOrganisms
through a project GitHub - galaxy-genome-annotation/apolpi: Partial, faster reimplementation of Apollo API because of how slow it is, to avoid this issue in other situations.
However, creating an organism also generates this same list of organisms, a process that currently takes 74 seconds on EU.
We can bump the timeouts as a short term solution which should help, but long term, hopefully this can be fixed by Apollo3.
I’ve requested an infrastructure change at EU, hopefully they can merge it soon Increase proxy timeout for apollo by hexylena · Pull Request #1013 · usegalaxy-eu/infrastructure-playbook · GitHub