I have Illumina NextSeq 2000 shallow shotgun metagenomic sequences (from fecal samples) that have been quality filtered and trimmed. I need to classify carnivore prey (metazoa), gastrointestinal parasites, bacteria, plants, protists, etc. Kraken2 has a comprehensive nt_core database (Index is 233 GB) available on their website but it is not available in Galaxy as a database option. I have it downloaded on an external hard drive (k2_core_nt_20240904.tar.gz) and could upload it if I had enough space. Alternatively, could I successfully import the NCBI Genome Dataset for Eukaryota (taxonomy ID: 2759) into Galaxy. What approach would you recommend?
I was able to upload a database from Eukaryome which I can use with Bowtie2. However, for larger databases it looks like I need to do an FTP upload. This Galaxy document (Galaxy FTP Upload - Galaxy Community Hub) says “FTP upload is not supported on the usegalaxy.org instance.” Whereas this one (Galaxy FTP Upload - Galaxy Community Hub) says “The address of FTP server for Main Galaxy is usegalaxy.org. Use the same email and password as for Galaxy.” Regardless CoffeeCup cannot connect to usegalaxy.org with my correct username and password. How can I get support with this?
Hey Laura! I can help with one piece of this at least. If you are comfortable with some basic command line, you can use the the galaxy-upload-utility to easily add the k2_core_nt database, or any local file, to one of your histories. I’ll link the documentation here: galaxy upload.
To run it, you just need to download it with either conda or pip, then run a command.
$ galaxy-upload --url https://usegalaxy.org --api-key [your api key] --history-name [your history name] [local file path name]
You can access your api key through user → preferences → manage api key. I’ve found this utility to be great at moving large data files around from a mixture of local file storage as well as from HPCs.
In addition, I’ll reach out to our admins about including the new kraken2 databases into the galaxy databases so that you don’t even need to import it. We had discussed adding it before and I’ll try and make sure that it gets added and that our internal configuration for kraken2 can handle the larger database size.
Also I believe that the ftp upload is deprecated for useGalaxy.org but I’ll try and follow up on that to confirm.
Alright I just talked to our admin @nate and he is going to update the existing databases and pull in the core_nt and GTDB databases from Index zone by BenLangmead. So they should be built in to the galaxy default databases once he is done.
Xref – one more too to add from earlier today – can you ask @nate Tyler? or maybe he’ll see this message? Thanks!
Hi tcollins2011,
I greatly appreciate the information about using command line to add any local file. Additionally, I am excited to hear @nate will be pulling in the core_nt and GTDB databases!
Thank you so very much,
Laura
Good Day!
As I do not know how long something like this is expected to take I just wanted to reach out and get a ballpark estimate on when the core_nt database might be available?
Best wishes,
Laura
The core_nt database is really huge, and we have been able to host it at the EU server for now.
See → Kraken2 at UseGalaxy.eu
If you are not sure how to move data between servers, this guide can help. → FAQ: Transfer entire histories from one Galaxy server to another
Hope this helps!
Great! I know it is quite large as I had managed to set it up in QIIME2 on my mac but then ran out of storage to do anything else. I already have an EU account and can move data. I really appreciate the information.