Blast issue when trying to add species name: Fetch Taxonomic Ranks

Joel_Brown · August 20, 2025, 6:48pm

I’m running a blastn search to identify best hits for a large batch of sequences. I would like to return the scientific name for the best hit in the tabular data. I using the following command:

blastn  -query '/anvil/scratch/x-xcgalaxy/main/staging/70028283/inputs/dataset_921a7f8f-68ab-47c1-a502-b5de65826183.dat'   -db '"/cvmfs/data.galaxyproject.org/byhand/blastdb/nt/2023-09-01/nt"'  -task 'blastn' -evalue '1e-05' -out '/anvil/scratch/x-xcgalaxy/main/staging/70028283/outputs/dataset_eb92d3c7-bb52-43d1-b0bb-2ad4416fbb01.dat' -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore  staxids sscinames scomnames'  -num_threads "${GALAXY_SLOTS:-8}" -strand both -dust yes  -max_hsps '3'

This works as expected except that the final columns containing the species identifications (14 and 15) are empty (“N/A”).

Upon closer inspection, I am getting the following warning:

Warning: [blastn] Taxonomy name lookup from taxid requires installation of taxdb database with ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz

How can the taxonomic database be added to the Galaxy servers?

Thanks!

Joel

jennaj · August 20, 2025, 7:37pm

Welcome @Joel_Brown

Yes, you are correct! The taxonomy functions are not wrapped into the BLAST+ functions directly. Instead, you can cut out the hits and run one of these tools to pull in the extra data you are interested in, then join columns as wanted.

Fetch taxonomic representation (find this at UseGalaxy.eu for now)
NCBI Datasets Gene download gene sequences and metadata

Placing all steps into a simple workflow is what most do. Then, that workflow functions in practical use as a “single custom tool” when you execute it.

For how to do this exactly, I’m guessing you already know about the our tabular data parsing tools but in case not or for anyone else reading later on: please try a search in the tool panel with common utility names, or see our tutorial here for a short tour.

GTN Materials Search (query=olympics)

Please give that a try and let us know if you have any follow up questions!

Joel_Brown · August 23, 2025, 6:07pm

Thanks for the tips. I tried implementing your suggestions and here’s where I’m at.

I extracted the accession numbers from column to above such that I have a separate dataset containing a list of GenBank accession numbers:

For example:

CP079925.1
CP129672.1
CP006704.1
CP067086.1
CP043573.1
CP119313.1
CP012996.1
CP012996.1
CP017141.1
CP017141.1

I then used the NCBI Datasets Genome to download metadata using the following settings:

When I do so I get the following error:

Troubleshooting/Discussion:

I’ve tried similar approaches with just the GI number (instead of GB number)…no success.
I’ve tried using the NCBI Datasets Gene tool instead…no success.
I have a list of taxon ID’s (column 13 in the OP). I haven’t found a way to easily convert these to species/common names.

Background (in case it helps):

I have a large dataset of ~700K reads acquired from from an ice-age bison bone using ONT sequencing. I’m trying to filter out bacterial contaminants and identify closest matches for whatever is left. I’m using a kraken2 filtering step to classify reads as bacterial or unclassified (non-bacterial). I’m then BLASTing the “unclassified” reads to identify closest matches. I can get the taxid from the BLAST search, but I need to find an an automated way to get scientific names/common names for my hits (preferably through Galaxy so that I can build this all into a workflow). Any suggestions related to this last step (or others along the way) are welcome.

Thanks!

Joel_Brown · August 23, 2025, 10:30pm

Update: I noticed the “Fetch Taxonomic Representation” tool you referenced in the edit. I tried this over at UseGalaxy.eu. It looks like exactly what I’m looking for…however, it’s still not working as expected.

See sample data below:

Using the following parameters…

…I’m getting the following error.

Looks pretty straightforward, but I can’t get it to work.

Joel

jennaj · August 25, 2025, 9:46pm

HI @Joel_Brown

Glad you found the tool at the EU server!

These NCBI tools are a bit picky since these go remotely through their API. The 3rd field looks Ok (unless any are empty or have NA values?) but I am wondering if the 1st has too many characters or if the dashes are leading to an unexpected column split somewhere.

So, confirm the 3rd column has valid values then I would suggest trying to simplify the query names in column 1. The first part of the IDs appear to be the same, maybe just keep the end? Underscores should be Ok but I would avoid other special characters.

In short, I think solving this is just fiddling with the formats a bit. Let us know if you can’t solve it and I’ll try to come up with an example that mimics yours for testing. Or, you can share yours back and I’ll experiment.