I am running a local instance of Galaxy 22.01 on my institution’s slurm cluster via DRMAA. I am configuring it to send jobs to our partition, where each job is given one node and all the cpus on that node (in job_conf.xml:
param id="nativeSpecification">--nodes=1 --ntasks=1 --cpus-per-task=32 --partition=vgl</param>). For more background on this cluster, when I am submitting jobs via sbatch and I need a tool from a conda environment, I need to either a) activate the conda env from the interactive head node, then submit the job, or b) have the job script explicitly
init my conda, then
source ~/.bashrc, then activate the required env, within the job script. Otherwise, the job can fail because it did not find the tool.
Onto the local Galaxy install!
Sometimes when I run a workflow, a tool works once but not another time, like in this screenshot:
Other times, a tool will not work when I try to invoke a workflow using it. These two BWA MEM jobs failed for different reasons in the same workflow. But then I tried invoking the workflow again, and then they both ran fine.
Another instance of BWA MEM working once then not again within the same invokation:
I have also been seeing this with BUSCO and QUAST. Here is an example of the workflow-invoked BUSCO failing, but then it ran fine on the same dataset when I clicked “rerun job”:
Sometimes for BWA MEM I get a database locked error (might be unrelated to the previous ones…):
Traceback (most recent call last): File "/lustre/fs5/vgl/scratch/labueg/galaxy_22.01/galaxy/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1803, in _execute_context cursor, statement, parameters, context File "/lustre/fs5/vgl/scratch/labueg/galaxy_22.01/galaxy/.venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute cursor.execute(statement, parameters) sqlite3.OperationalError: database is locked
For the above database locked error, this was when I was running the bwa-mem workflow a couple times to try to debug it, so maybe it is coming from trying to run the workflow twice at once or something? I have only started seeing this error recently, like last night when trying to run it a couple times.
For tools that I have not run into this issue with: meryl, merqury, and hifiasm have all been running without a hitch, I have not seen any of the above errors with them.
The workflows I am using are found here: HiC workflow (with BWA MEM) and [iwc/Galaxy-Workflow-Long_read_assembly_with_Hifiasm_and_HiC_data.ga at VGP · Delphine-L/iwc · GitHub](https://Hifiasm-HiC workflow (with BUSCO/QUAST)). Here is the rest of my
job_conf.xml in case it is helpful:
<?xml version="1.0"?> <!-- A sample job config that explicitly configures job running the way it is configured by default (if there is no explicit config). --> <job_conf> <plugins> <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/> <plugin id="slurm" type="runner" load="galaxy.jobs.runners.slurm:SlurmJobRunner"> <param id="drmaa_library_path">/vggpfs/fs3/vgl/store/labueg/programs/slurm-drmaa/slurm-drmaa-1.1.3/lib/libdrmaa.so</param> </plugin> </plugins> <destinations default="slurm-vgl"> <destination id="local" runner="local"/> <destination id="slurm-vgl" runner="slurm"> <param id="nativeSpecification">--nodes=1 --ntasks=1 --cpus-per-task=32 --partition=vgl</param> </destination> <destination id="slurm-bigmem" runner="slurm"> <param id="nativeSpecification">--nodes=1 --ntasks=1 --cpus-per-task=64 --partition=vgl_bigmem</param> </destination> </destinations> </job_conf>
Any help is appreciated, and please let me know if any more details would help! Thank you for your time!