Hello, Galaxy!
I am running a local instance of Galaxy 22.01 on my institution’s slurm cluster via DRMAA. I am configuring it to send jobs to our partition, where each job is given one node and all the cpus on that node (in job_conf.xml: param id="nativeSpecification">--nodes=1 --ntasks=1 --cpus-per-task=32 --partition=vgl</param>
). For more background on this cluster, when I am submitting jobs via sbatch and I need a tool from a conda environment, I need to either a) activate the conda env from the interactive head node, then submit the job, or b) have the job script explicitly init
my conda, then source ~/.bashrc
, then activate the required env, within the job script. Otherwise, the job can fail because it did not find the tool.
Onto the local Galaxy install!
Sometimes when I run a workflow, a tool works once but not another time, like in this screenshot:
Other times, a tool will not work when I try to invoke a workflow using it. These two BWA MEM jobs failed for different reasons in the same workflow. But then I tried invoking the workflow again, and then they both ran fine.
Another instance of BWA MEM working once then not again within the same invokation:
I have also been seeing this with BUSCO and QUAST. Here is an example of the workflow-invoked BUSCO failing, but then it ran fine on the same dataset when I clicked “rerun job”:
Sometimes for BWA MEM I get a database locked error (might be unrelated to the previous ones…):
Traceback (most recent call last):
File "/lustre/fs5/vgl/scratch/labueg/galaxy_22.01/galaxy/.venv/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1803, in _execute_context
cursor, statement, parameters, context
File "/lustre/fs5/vgl/scratch/labueg/galaxy_22.01/galaxy/.venv/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: database is locked
For the above database locked error, this was when I was running the bwa-mem workflow a couple times to try to debug it, so maybe it is coming from trying to run the workflow twice at once or something? I have only started seeing this error recently, like last night when trying to run it a couple times.
For tools that I have not run into this issue with: meryl, merqury, and hifiasm have all been running without a hitch, I have not seen any of the above errors with them.
The workflows I am using are found here: HiC workflow (with BWA MEM) and [iwc/Galaxy-Workflow-Long_read_assembly_with_Hifiasm_and_HiC_data.ga at VGP · Delphine-L/iwc · GitHub](https://Hifiasm-HiC workflow (with BUSCO/QUAST)). Here is the rest of my job_conf.xml
in case it is helpful:
<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is
configured by default (if there is no explicit config). -->
<job_conf>
<plugins>
<plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/>
<plugin id="slurm" type="runner" load="galaxy.jobs.runners.slurm:SlurmJobRunner">
<param id="drmaa_library_path">/vggpfs/fs3/vgl/store/labueg/programs/slurm-drmaa/slurm-drmaa-1.1.3/lib/libdrmaa.so</param>
</plugin>
</plugins>
<destinations default="slurm-vgl">
<destination id="local" runner="local"/>
<destination id="slurm-vgl" runner="slurm">
<param id="nativeSpecification">--nodes=1 --ntasks=1 --cpus-per-task=32 --partition=vgl</param>
</destination>
<destination id="slurm-bigmem" runner="slurm">
<param id="nativeSpecification">--nodes=1 --ntasks=1 --cpus-per-task=64 --partition=vgl_bigmem</param>
</destination>
</destinations>
</job_conf>
Any help is appreciated, and please let me know if any more details would help! Thank you for your time!