Adjust uWSGI/mule settings to available processor cores using local job runner

It’s my first post here, so Hi to everyone!

I am a Galaxy user since few years now. I have used Galaxy mostly by my own, locally, via Docker. Now I am about to include Galaxy in a teaching, thus I need to get a bit deeper into configuration and need your help.

What I have
I have a server with 32 threads available. The group using it will consist of 20-25 people at the same time.

What I need
I need a way to configure Galaxy launched from Docker to be able to run locally 32 single thread jobs in paralel and serve content to 20-25 users at the same time EFFICIENTLY. I want also to separately limit the number of concurrent jobs with high memory usage.

What I tried
I was playing with uWSGI section of galaxy.yml file (on volume attached via docker run -v in path /some_mount_path/galaxy-central/config/) setting the parameters:

http: 127.0.0.1:8080
buffer-size: 65536
processes: 8
threads: 4
offload-threads: 2
static-map: /static/style=static/style/blue
static-map: /static=static
static-map: /favicon.ico=static/favicon.ico
master: true
virtualenv: .venv
pythonpath: lib
module: galaxy.webapps.galaxy.buildapp:uwsgi_app()
manage-script-name: false
thunder-lock: true
die-on-term: true
hook-master-start: unix_signal:2 gracefully_kill_them_all
hook-master-start: unix_signal:15 gracefully_kill_them_all
py-call-osafterfork: true
enable-threads: true

To limit the concurrent jobs with high memory usage I have created job_conf.xml in the same location:

<job_conf>
    <plugins>
            <plugin id="local" type="runner" load="galaxy.jobs.runners.local:LocalJobRunner" workers="8" />
    </plugins>
    <handlers>

    </handlers>
    <destinations default="single">
        <destination id="single" runner="local" tags="all_local"/>
        <destination id="highmem" runner="local" tags="all_local">
       </destination>
    </destinations>

    <limits>
        <limit type="anonymous_user_concurrent_jobs">1</limit>
        <limit type="destination_total_concurrent_jobs" tag="all_local">32</limit>
        <limit type="destination_total_concurrent_jobs" id="highmem">6</limit>
        <limit type="walltime">48:00:00</limit>
    </limits>

    <tools>
      <tool id="rgrnastar" destination="highmem" />
    </tools>
</job_conf>

The major problem is that I don’t understand dependencies between uWSGI processes + threads, number of workers defined for plugin and how all that stuff is summing up to provide which number of jobs? I have used 8 processes with 4 threads (sum = 32) with 8 workers in job_conf.xml and the effect is that sometimes I have 8, sometimes 10 jobs running simultaneously, rest is queued and web interface is becoming unresponsive (with single user!). When I run htop on server, I can identify only two job handlers (handler1, handler0) in running processes (a docker issue?).

If anyone could explain me the “threads algebra” - how to calculate and control concurrent jobs number. It would be great to use uWSGI + mules setup, but how to set it up to avoid overloading?

I would be more than thankful!