$GALAXY_SLOTS equals to 1

Hello!

I was advised to use $GALAXY_SLOTS variable in order to obtain number of processor cores. Workflow automation? - #2 by mvdbeek But when i print this variable in the tools, i get exactly one despite i have 16 - 48 logical processors.

Why it is not set automatically? E. g why not add

 elif [ `nproc` > 0 ]; then 
     GALAXY_SLOTS=`nproc`

into /data/galaxy/lib/galaxy/jobs/runners/util/job_script/CLUSTER_SLOTS_STATEMENT.sh ?

We assume and highly recommend that jobs submitted by Galaxy are run on a HPC system with a job scheduler (like SLURM, PBS, Torque, LSF) or kubernetes. You then configure the destinations in the job_conf.xml file to request the required resources, and you assign tools to the destination. If you don’t have this available you can use local_slots as you found out, but you still need to map tools to destinations.

Note that this is not recommended, as there is no way for Galaxy to know what resources are available, so you will start consuming more resources than are available if you queue up many jobs.

Connecting Galaxy to a compute cluster walks through installing slurm using ansible on the head node … while that’s still not ideal it is better than using the local job runner.

3 Likes

Is it necessary to use ansible to install? I never used ansible so i think it will take quite a while for me to learn it.

I installed slurm without ansible, and it works, but i see GALAXY_SLOTS=2 despite i have 32 logical processors, so each of the tasks uses no more than 2 cores (as e. g. htop shows). Is it possible to assign more cores to the tasks (and maybe even different amount of cores to different tasks)?

transgen@transgen-3:~/galaxy/database/jobs_directory/000/761$ cat /etc/slurm-llnl/slurm.conf
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=localhost
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurm.state
SwitchType=switch/none
TaskPlugin=task/cgroup
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
# JobAcctGatherType=jobacct_gather/cgroup
#SlurmctldDebug=info
#SlurmctldLogFile=
#SlurmdDebug=info
#SlurmdLogFile=
#
#
# COMPUTE NODES
NodeName=transgen-3 NodeAddr=localhost CPUs=32 Sockets=2 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=64306 State=UNKNOWN
PartitionName=transgen-3-partition Nodes=transgen-3 Default=YES MaxTime=INFINITE State=UP

transgen@transgen-3:~/galaxy/database/jobs_directory/000/761$ cat /etc/slurm-llnl/cgroup.conf
###
# Slurm cgroup support configuration file.
###
CgroupAutomount=yes
CgroupMountpoint=/sys/fs/cgroup
ConstrainCores=yes
ConstrainDevices=yes
ConstrainKmemSpace=no        #avoid known Kernel issues
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
TaskAffinity=no              #use task/affinity plugin instead

transgen@transgen-3:~/galaxy/database/jobs_directory/000/761$ cat ~/galaxy/config/job_conf.xml
<?xml version="1.0"?>
<!-- A sample job config that explicitly configures job running the way it is
     configured by default (if there is no explicit config). -->
<job_conf>
    <plugins>
        <!--plugin id="pulsar" type="runner" load="galaxy.jobs.runners.pulsar:PulsarRESTJobRunner"/-->
        <plugin id="slurm" type="runner" load="galaxy.jobs.runners.slurm:SlurmJobRunner"/>
        <!-- plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="4"/-->
    </plugins>
    <destinations>
	    <!-- destination id="local" runner="local"/-->
        <destination id="slurm" runner="slurm">
        </destination>
        <!--destination id="remote_cluster" runner="pulsar">
            <param id="url">http://localhost:8913/</param>
            <param id="submit_native_specification">-P bignodes -R y -pe threads 16</param>
            < ! - - Look for trinity package at remote location - define tool_dependency_dir
            in the Pulsar app.yml file.
             - - >
            <param id="dependency_resolution">remote</param>
        </destination-->
    </destinations>
</job_conf>

(and also on another machine slurm does not work and says drained Low RealMemory, but i think it is slightly outside of the scope…)

And in general, is there a way to schedule the jobs by taking into account actual amount of used cores/ram? Each tool uses quite different amount of resources, and if i assign 1 core for each, the most computation intensive tools will execute quite slowly, but if i assign e. g. 16 cores, i will be able to run no more than 2 jobs in parallel, most of which will never utilize such processing power. So i will finish with great underexploitation of my cpu in both cases. And so with memory.

Mapping tools to destinations with different resources is explained in Mapping Jobs to Destinations

1 Like

Follow-up question: Is there a way to increase the number of cores used on a public server such as Galaxy Europe, instead of a local installation?

Perhaps this is easy to answer, but the closest I see is this for Galaxy Main. Whether the user has any control is unclear. Is the default 1 core? And can the user specify a different number of cores?

Hi @jaredbernard,
in the specific case of usegalaxy.eu, you can open a PR on this file in order to increase the resources of a specific tool.

Regards

1 Like