FASTQ Mapping to Reference Issues

I’m having some mapping issues and could use some suggestions. So I’m working with WGS on whole blood samples, with unknown pathogens. Using a Few online WGS classifiers 1. Alignment based, 2. Kmer based, and 3. KAIJU protein translated based, also Kraken2 on Galaxy.

  1. Kmer based, KAIJU and Kraken2 show me a very high amount of various species of Mycobacterium, alignment based just a handful of reads. I try to map the mycobacterium species and assemble, but can’t get good consistent alignment with BLAST. I took some of the reads from the Kmer based classifier which said Mycobacterium Leprae, BLAST for Mycobacterium showed that all the reads matched numerous strains of Tuberculosis Oman-Strain (KAIJU classified as Tuberculosis), but if I didn’t specify Mycobacterium the alignment would actually be matching Naegleria Fowleri NF001 strain often at 99-100% ID at the full length of the read, but sometimes Human is at the top with 1-2 bases different. So I got the Naegleria Fowleri NF001 reference and chopped it up, and with Kraken2 Mycobacterium V1 database, it was actually showing the same Mycobacterium and other mixed species. So it appears so far that I’m working with Naegleri Fowleri NF001, which has many reads mapping 100% to Tuberculosis.
  2. So now I’m trying to map directly the Naegleri Fowleri NF001, but having issues. I tried BWA-MEM default, and such crappy results, I was getting reads 100-150bp where BLASTn only aligned maybe a 30bp section to Naegleri Fowleri NF001, so not even closely mapping.
  3. What is my best way to get my Naegleria Fowleri reads and separate from Human reads??? Do I keep everything that matches 100% to Naegleria Fowleri, do I separate everything that Matches Human Hg38 at 100%??, but what about other closely matching reads?? All reads are fully trimmed with adapters removed.

Thanks a bunch!!!

Hi @Jon_Colman

Yes, a complicated data situation. I’m sure you have seen the tutorials already. Let’s try to get some input from the microGalaxy community. The cross post is here over to their chat. They will probably reply at this forum topic, but feel free to join the chat too! :slight_smile: You're invited to talk on Matrix

You wouldn’t really expect to find Mycobacteria in blood. If this is human blood I would definitely remove human reads first (e.g. using the approach here Hands-on: Removal of human reads from SARS-CoV-2 sequencing data / Removal of human reads from SARS-CoV-2 sequencing data / Sequence analysis) and then look at the remaining reads. Do you have any clinical suspicion as to what you’re looking at here?