Kubernetes Azure (AKS) faulty deployment (even all pod status seem fine)

Problem:

  • We’re using the cloudve/galaxy Helm chart, but encountered deployment issues in AKS despite following the README.
  • After troubleshooting, we were able to deploy pods using the following command (choose install or upgrade as needed):
helm install/upgrade galaxy cloudve/galaxy -n galaxy -f collie-values.yaml --debug
  • The collie-values.yaml configures storageClass using azurefile-csi and disables problematic services (refdata , cvmfs ). Also, due to error similar to this issue, we disable all startupProbe, readinessProbe and livenessProbe.
  • While pods appear healthy visually (see attached image), web and celery pod logs indicate serious issues (access logs in the provided Google Drive folder):

    The 2 log file can be accessed here:
    galaxy-help-kubernetes - Google Drive

Question:

  • How can we fix the errors in the web and celery pods and successfully launch Galaxy?

Request:

  • Any recommendations to troubleshoot and resolve the errors are appreciated.

Hi @unibkmasterstu

I’ve cross-posted your question over to the Admin chat to attract some help for this. They may reply here or there, and feel free to join!
:hammer_and_wrench: You're invited to talk on Matrix

Thanks!