The accuracy of re-blasting known sequences of human gut microbes is very low

Gastrointestinal Bacteria Culture Collection (derived from Human Gastrointestinal Bacteria Culture Collection) was selected and the fragment with a length of 100bp was randomly selected for re blastn, but the accuracy rate was 13% at genus level and only 7% at species level. May I ask why this is, very confused.

Welcome, @houzhi

If you are not getting expected hits with BLASTN+, there could be a few factors impacting the result.

  • Scientific items are one part of this: expected content of the query reads and the content of the target database.
  • For technical items, the places to review are commonly the query read quality/length and the match parameters applied, including how the results are sorted out after mapping.

If you want to review our example protocols and template workflows for Microbiome analysis, please see:

Other than that, the original data authors used a few different strategies, so you could explore their methods if you are wanting to replicate results. Many of the exact tools they mention have been wrapped for Galaxy.

Xref A human gut bacterial genome and culture collection for improved metagenomic analyses | Nature Biotechnology

Hope this helps! :slight_smile: