Increasing cores

I’m currently running MAFFT on Galaxy Europe. My multifasta is admittedly large - 26MB - and it has been running for 4 days. Does anyone know what the default number of cores/threads for this is tool, and whether I can make a request on a github site to increase the cores?

Thanks!

Hi @jaredbernard

Did the job finish by now?

Hi, @jennaj. No, not yet. Command line seems to indicate the default is 1 core: --thread ${GALAXY_SLOTS:-1}
Some of my multifastas are quite large, and a few have failed with an exit code 1 so I was wondering if a different program would handle them better, such as SINA, although that one relies on a reference if I’m not mistaken. I just wanted to use MAFFT because it was shown to be better than others in handling genomic data (Portik & Wiens 2021).

Hi @jaredbernard

The MAFFT tool is computationally expensive. In the past, the tool has only handled “smaller” datasets at the public servers. But the EU server offers the most of any … so you are testing at the right place.

You could ask the admins about extending the resources. Find their chat at the bottom of the server homepage. You can link back here to provide extra context for the request.

Other than that, you could try out different ways of processing. Maybe do some core set of sequences to get an idea of the maximum successful input and best parameters, then layer in more with MAFFT add. Your limits are the length and depth for this one, and not so much the total file size directly. And the parameter choices of course.

1 Like

Thanks, @jennaj. I’ll try asking about the cores on the EU chat – I couldn’t remember where to find it. And thanks for the tip about MAFFT add!

Hi,

you can find the resources here: tpv-shared-database/tools.yml at main · galaxyproject/tpv-shared-database · GitHub which is shared across Galaxy instances and then local overwrites for EU here: infrastructure-playbook/files/galaxy/tpv/tools.yml at master · usegalaxy-eu/infrastructure-playbook · GitHub

${GALAXY_SLOTS:-1} is a bash syntax and just states that if the environment variable GALAXY_SLOTS is not set, then default to 1 core. We do set GALAXY_SLOTS with the values given in the linked resources files.

What is your scientific question you want to answer? There might be a better way to solve your problem that does not involve mafft.

Ciao,
Bjoern

1 Like

Thanks for getting back to me, Björn. I assumed that was the case with GALAXY_SLOTS, but it’s good to get verification.

I’m constructing large gene phylogenies so I can compare their rates of evolution. So somehow, I’ll need to get the alignments and then trimming done. Right now, I’m trying MAFFT add, as @jennaj suggested.

Hi @jaredbernard We do have a related tutorial. Uses a similar tool you could also try out. Preparing genomic data for phylogeny reconstruction