Kraken2 new Core-nt reference

Jon_Colman · February 12, 2025, 9:32pm

I’m trying to run kraken2 with the new reference but it immediately gives error without any information regarding errors.

jennaj · February 13, 2025, 1:55am

Hi @Jon_Colman

You error looks like this, correct? Inside the red error dataset, expanded on the job information page (using the i-icon).

No destinations are available to fulfill request: toolshed.g2.bx.psu.edu/repos/iuc/kraken2/kraken2/.*

If yes, it is because the database is brand new at UseGalaxy.org and we are still adjusting the resources. Meanwhile, you can choose a different database or run the tool at UseGalaxy.eu instead.

If no, and you are getting a different error, you are welcome to share back a share link to the error and we can review closer. Please see the banner at this form for How To or here directly. → How to get faster help with your question

More soon about the known problem resolution, and thanks for reporting the problem!

Jon_Colman · February 13, 2025, 10:17pm

Hi jenna,
Yes, I was thinking that could be the problem. I was trying to run them through the .eu site, kraken2 appears to be working, but after a full day the jobs don’t complete.

Another issue on Kraken2 that I have noticed on the main and .eu site. Since most of our files are formatted as fastqsanger.gz, when running Kraken2 and using the option to output Classified and Unclassified. What is happening is that the output files are being decompressed, yet still being listed as fastqsanger.gz, which is causing issues using these files with the next program. If I try to compress these files, it makes it even worse. My workaround was to take the output and change the attributes to fastqsanger, after which everything works fine.

jennaj · February 14, 2025, 12:07am

Hi @Jon_Colman

I would be very interested in reviewing that use case!

The dataset name shouldn’t matter, only the assigned format datatype. And for implicitly converted reads (that “decompressed” step), the alternative version(s) are sort of nested into the original should be available to any downstream tools that needs them.

Would you like to share back your example? Maybe there is a corner case bug with the downstream tool and it isn’t finding all of the different format versions of the data for some reason. We could fix it!

Update: I reread and think I can test this on my own. Started in here. But you can still share your example since I’m not sure which downstream tool you are using and that might matter.

https://usegalaxy.org/u/jen-galaxyproject/h/test-ghelp-https-help-galaxyproject-org-t-kraken2-new-core-nt-reference-14714-2

For the job not completing at EU, do you mean that it is still executing (yellow datasets)? If so, that means it is still processing. This is a large target database, so that might be expected, but I can check all of that too if you want to share the history with the job.

jennaj · February 14, 2025, 8:48pm

The core_nt database is now ready to use at UseGalaxy.org.

Thanks!

Jon_Colman · February 16, 2025, 3:06am

I have noticed one issue with Kraken2 with larger datasets, it will have a memory crash. For example, if I try to extract certain read classifications, and then extract the host (largest classification), it will run out of memory on the host classification. I believe this is currently only on galaxy.org, which is currently running Kraken2 much faster than galaxu.eu, maybe more memory can be allocated for this issue?

jennaj · February 18, 2025, 6:58pm

Hi @Jon_Colman

For very large data, yes, the UseGalaxy.eu server can sometimes process memory intensive operations a bit better than the others.

Each server hosts different computational resources. This means that the cluster nodes have distinct profiles. Tools may run faster or slower, and a job may overwhelm the cluster at one server but not the other.

The Kraken2 tools already have the maximum resources allocated at UseGalaxy.org. The best advice is to run these jobs at the public server where they process.

That said, if you want to send in a bug report for an example error, I’ll take another look. If you would include in the comments that same exact job as processed at the EU server (must be exact please and in a share history), we’ll take a second look at the technical details. Also put a link to this topic in the comments. If you don’t include these we can’t do the comparison, and you’ll probably just get a general help message returned about trying at EU instead.

XRef → FAQ: Understanding 'exceeds memory allocation' error messages