Kraken2 build databases download fails with not found path

Hello,

When installing a kraken2 db with using
toolshed.g2.bx.psu.edu/repos/iuc/data_manager_build_kraken2_database/kraken2_build_database/2.1.6+galaxy0

I tried to install a database, ie k2_pluspf_16_GB_20250714

But the path is incorrect and it fails at downloading:
--2026-02-01 08:49:52-- https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_16gb_20250714.tar.gz Resolving genome-idx.s3.amazonaws.com (genome-idx.s3.amazonaws.com)... 16.15.194.29, 3.5.8.122, 3.5.25.23, ... Connecting to genome-idx.s3.amazonaws.com (genome-idx.s3.amazonaws.com)|16.15.194.29|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2026-02-01 08:49:52 ERROR 404: Not Found.

The name is kraken/k2_pluspf_16_GB_20250714.tar.gz and not k2_pluspf_16gb_20250714.tar.gz.
(Note the underscore between 16 and GB).

Where should this be fixed/updated? I suppose I can fix the name locally on our UseGalaxy instance.

Side question: is this the correct place to raise such issues, or would there be a more appropriate medium ?

Cheers,

Hi @ccoulombe

This is a great place to ask questions! We can try help directly by coordinating help plus route you to the best contacts as needed. :scientist:

For your question about Data Manager, this is the version you are using, correct? Galaxy | Tool Shed

I can confirm that the path for the PlusPF-16 dataset would be

https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_16_GB_20251015.tar.gz

Then, thet last prior two releases are formatted like this. Between April and July 2025 the format changed. I can’t find the release note but there was likely some reason up at the higher level before it flowed down to everyone.

https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_16_GB_20250714.tar.gz
https://genome-idx.s3.amazonaws.com/kraken/k2_pluspf_16gb_20250402.tar.gz

So, yes, the underscore and case is different, exactly as you noticed!

This was a good catch and I’m glad you reported it. I’ve opened a PR for review by the IUC. They will comment and let us know if!

For immediate use, you could try using the Custom option on the data manager form. Or, retrieve the 2024 release?

You could also load up the indexes hosted at the UseGalaxy servers! These are a remote CVMFS service attached to your local server. You would gain access to all of the reference data hosted, not just Kraken2, which means you don’t have to store it persistently. If this interest you, we have some tutorials walking through the configuration! Please ask if you have questions about it.

When using CVMFS, you can still attach local files through the regular tool-data. Meaning, these can work together. The tool-data tables can handle duplications and present them as an option to users. The first key in loc files is usually the primary key, and the URL is fixed too, but the name variable is usually what you can customize any way you want.

Please let us know if this addresses all of what you noticed! And, if you can get the custom option to work, great! The tool update will take some time to move through the review process – how long is difficult to guess, it could be fast! :slight_smile:

1 Like

Thanks!

If helpful, I can open such PR next time (hence my question about this medium).

Oh interesting, the custom option. I’ll look into that but I’m unsure about the values that I should provide for the required parameters like K-mer length in BP *.

The reference data made available through CVMFS is really what we should be using..

Thanks for pointing this out!! Really helpful!

Hi @jennaj

On our instance (UseGalaxy.ca) we are already using the reference data from CVMFS

    # CVMFS
    tool_data_table_config_path: /cvmfs/data.galaxyproject.org/byhand/location/tool_data_table_conf.xml,/cvmfs/data.galaxyproject.org/managed/location/tool_data_table_conf.xml

And I confirm we do have access to the databases made available.

But in the case that brought up the download issue, the user asked for an updated version of Kraken2 databases (~2025-07), and as far I can tell, this is not available already?

Thanks again for the help!

Hi @ccoulombe

I didn’t know this was for UseGalaxy.ca!! Welcome aboard. I’ll send you a direct message with more about this.

Then, for the Kraken2 indexes, yes, once the DM is corrected, the IDC will be able to generate the indexes into CVMFS and your server can inherit them, layering in consistency between the UseGalaxy servers (aspirational, so your decision of course!).

Some updates are already requested and I just added this new one in.

Next time, or even this time, you are welcome to join into the development directly as the starting point. If you notice some issue with a tool wrapper, follow the links to the source code and propose changes in a PR or open an Issue ticket for discussion. Then you can link that back into discussions here if you notice something or want to create a topic for end-users to find – especially of the item impacts all servers.

In short – two steps: 1) update a DM to incorporate new data, then 2) to use the DM to get the data into the shared mount. Often it is just the second step and less common there isn’t a DM (yet).

Thanks for the details!

I’ve commented on PR, and notified the maintainer of the Kraken2 DBs Please keep nomenclature of file name consistent. · Issue #44 · BenLangmead/aws-indexes · GitHub

In the meantime, I’ve checked with the user and using the version from December 2024 works for him so I went ahead and installed it.

It installed correctly, but then the user reports that a job using that reference DB fails with:

but this is so strange since the file is actually present:

$ ls 2026-02-03T191402Z_standard_prebuilt_pluspfp_16gb_2024-12-28/taxo.k2d
2026-02-03T191402Z_standard_prebuilt_pluspfp_16gb_2024-12-28/taxo.k2d

Does this ring a bell ?

Thanks

Hi @ccoulombe

Yes, using the files outside of CVMFS might have some trouble. Correcting the automatic indexing is part of what the rewrite for the DM was addressing. I wonder if we missed something .. but I would check the job runtime environment first!

Ideas

  1. Make sure the tool is running inside of the tool’s container environment (if possible!) and that all reference data is actually available to the runtime environment.

    • My guess is that none of the files were accessible. Is the data is nested? Or, do you need to adjust file permissions? Compression status? Or, the file names?
    • Then, cross test: does this error only come up with this new database or all Kraken2 databases? Just locally, from CVMFS, or all?
    • You can use the tool-test-1 sample to test directly as an administrator and watch the job environment. This will give you some clues about where in the job steps the data can’t be seen by the tool at runtime.
  2. Compare your local tables and file structure to those in CVMFS. Is anything missing from a loc file? All files are named correctly? In the same directory structure?

  3. Double check that you are using the very latest version of the DM from the main toolshed to create the index suite. If you can pinpoint a problem with this new version, then you can report it (Issue) or suggest a correction directly (PR). You can link back anything you create!

  • This is part of what was being reviewed in my PR – Wolfgang thinks that changing the name of the macro wasn’t really enough – and the directory the data is in needs an adjustment too. :grimacing:
  • But, that was for the new format of the 2025 indexes and you seem to be using 2024. Those are not in CVMFS so I still think you are creating these.
  • Maybe there is an issue with all new additional indexes, or maybe just under certain cluster configurations, or with a certain version of Galaxy, those sorts of situations.

A laundry list! But I tried to put these in the order we usually go through it!

Let’s also ping @wm75 to see if he noticed anything else or maybe what this is exactly! I see the 2024 indexes in the local data on the EU server, so he is likely who created it there. He may also be able to share that index directory and the loc table lines with you directly.

This is weird indeed, and we don’t have that precise combination of DB and release date installed on Galaxy Europe.
I’m running this now and can let you know by tomorrow whether this works for us.

Ok, the newly installed 2026-02-05T110950Z_standard_prebuilt_pluspfp_16gb_2024-12-28 DB is accessible and works fine on Galaxy Europe without any tweaks.

So I guess you need to follow @jennaj `s suggestions and check e.g. permission settings of your mounted data dir, etc.

Thank you @jennaj for the laundry list.
Thanks @wm75 for confirming!

We’ll continue digging…

Thanks and chapeau to @jdavcs :top_hat:, with one fix he solved two issues!! :slight_smile: :tada:

A bind-mount for the singularity_volume to the tool_data dir, solved it.

1 Like