Job failure due to exceeding resources at public Galaxy servers -- Solution: Modify the job or consider a custom Galaxy server

Hi,

I am trying to run blastn using a ~21MB .ffn file from the genes of a bacterial genome and a ~6GB database .fasta file consisting genes from multiple different bacterial genomes. However, I got this error as a result of job submission:

‘num_threads’ is currently ignored when ‘subject’ is specified.
/jetstream/scratch0/main/jobs/37541641/command.sh: line 120: 2997 Killed blastn -query ‘/jetstream/scratch0/main/jobs/37541641/inputs/dataset_60967048.dat’ -subject ‘/jetstream/scratch0/main/jobs/37541641/inputs/dataset_60968206.dat’ -task ‘blastn’ -evalue ‘0.001’ -out ‘/jetstream/scratch0/main/jobs/37541641/outputs/dataset_60990860.dat’ -outfmt ‘6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen’ -num_threads “${GALAXY_SLOTS:-8}” -strand both -dust yes -parse_deflines

Has anyone had any experience on solving this error and submitting blastn jobs successfully?

Thanks!

1 Like

Hi @earthworm

Thanks for sending in the bug report. I replied to that with more details specific to your situation, so below is just a summary for others reading.

  1. What does this warning mean? ‘num_threads’ is currently ignored when ‘subject’ is specified.
  • That the Galaxy server you are working is configured to set the number of threads the job uses based on the content of a target database that is built-in (run through makeblastdb).
  • If a custom target fasta is used instead, that pre-set option is ignored. Why? For practical reasons determined by the authors of BLAST+. This warning message is reported as stderr by the underlying tool itself.
  • A custom target fasta can be run through makeblastdb in Galaxy (the tool is wrapped and available) but the indexing may not work well for all fasta formats (anywhere) or fasta datasets with a large number of individual sequences at public Galaxy servers (indexing job also exceeds resources). All BLAST+ tools are picky about format - queries and targets.
  1. When a job is too large to run at a public server as submitted (for resource reasons), modifying the inputs or refining parameters can be potent solutions. The goal would be to reduce the memory a job needs, the time it takes to run, or to limit the output content so that the work does not exceed available computational resources.

  2. Galaxy itself can handle even the largest of projects. However, very large, compute-intensive, and/or time-sensitive analysis projects are not appropriate for public Galaxy servers. Consider setting up a Galaxy server (personal, lab, institutional) and allocating sufficient resources. For any tool, the resources required to run a job in Galaxy are the same as running that same tool line command, and the underlying tool’s documentation is the best place to learn about tuning computing resources. Admins of a Galaxy server can also specify how attached computing resources are utilized (local or cloud).

More about setting up a Galaxy server for urgent or large work:

This forum can be searched with keywords like “cloud” or “gvl” to find prior Q&A about working with custom Galaxy servers. I also added some tags to your post that point to topics like that.

Hope that helps!

2 Likes