FASTQ Mapping to Reference Issues

Jon_Colman · April 11, 2025, 8:03pm

I’m having some mapping issues and could use some suggestions. So I’m working with WGS on whole blood samples, with unknown pathogens. Using a Few online WGS classifiers 1. Alignment based, 2. Kmer based, and 3. KAIJU protein translated based, also Kraken2 on Galaxy.

Kmer based, KAIJU and Kraken2 show me a very high amount of various species of Mycobacterium, alignment based just a handful of reads. I try to map the mycobacterium species and assemble, but can’t get good consistent alignment with BLAST. I took some of the reads from the Kmer based classifier which said Mycobacterium Leprae, BLAST for Mycobacterium showed that all the reads matched numerous strains of Tuberculosis Oman-Strain (KAIJU classified as Tuberculosis), but if I didn’t specify Mycobacterium the alignment would actually be matching Naegleria Fowleri NF001 strain often at 99-100% ID at the full length of the read, but sometimes Human is at the top with 1-2 bases different. So I got the Naegleria Fowleri NF001 reference and chopped it up, and with Kraken2 Mycobacterium V1 database, it was actually showing the same Mycobacterium and other mixed species. So it appears so far that I’m working with Naegleri Fowleri NF001, which has many reads mapping 100% to Tuberculosis.
So now I’m trying to map directly the Naegleri Fowleri NF001, but having issues. I tried BWA-MEM default, and such crappy results, I was getting reads 100-150bp where BLASTn only aligned maybe a 30bp section to Naegleri Fowleri NF001, so not even closely mapping.
What is my best way to get my Naegleria Fowleri reads and separate from Human reads??? Do I keep everything that matches 100% to Naegleria Fowleri, do I separate everything that Matches Human Hg38 at 100%??, but what about other closely matching reads?? All reads are fully trimmed with adapters removed.

Thanks a bunch!!!

jennaj · April 17, 2025, 6:08pm

Hi @Jon_Colman

Yes, a complicated data situation. I’m sure you have seen the tutorials already. Let’s try to get some input from the microGalaxy community. The cross post is here over to their chat. They will probably reply at this forum topic, but feel free to join the chat too! You're invited to talk on Matrix

Peter_van_Heusden · April 18, 2025, 2:53pm

You wouldn’t really expect to find Mycobacteria in blood. If this is human blood I would definitely remove human reads first (e.g. using the approach here Hands-on: Removal of human reads from SARS-CoV-2 sequencing data / Removal of human reads from SARS-CoV-2 sequencing data / Sequence analysis) and then look at the remaining reads. Do you have any clinical suspicion as to what you’re looking at here?

Jon_Colman · April 22, 2025, 9:53pm

Hi Peter, I will fill you in a little on the history. This is from a suspected post surgical infection where a hospital had an “outbreak” of a suspected bacteria. There were numerous deaths involved over several months, for what should be a treatable infection with antibiotics. Though I don’t know the specifics, only what is reported, it sounds suspicious to have all of these deaths from a common bacteria.
So I’m dealing with an undiagnosed situation from that outbreak. Since there was no testing done, I started with 16s samples that showed unusually high levels of bacteria, I followed this up with shotgun sequencing of whole blood. These were initially run through several online platforms (CZID.org, OneCodex.com, and KAIJU protein translated via Kbase.org), all of these platforms reported Mycobacterium. Sticking with OneCodex and KAIJU, both initially reported Mycobacterium Leprae at very high levels as well as Plasmodium Ovale, then about a year ago KAIJU changed it’s identification to Mycobacterium Tuberculosis. Yet until recently, I didn’t know what reads they were classifying, but OneCodex revised their platform for the ability to download the reads classified to different species. So I downloaded the reads and blasted them individually, and assembled to Mycobacterium. OneCodex was hitting on a small protein region that matched Mycobacterium Leprae, but on Blast Mycobacterium all of the reads were hitting Mycobacterium Tuberculosis strains of MTB-Oman. I tried to map and assemble to this strain, but was only coming up with contigs in the 200-450bp, which were not consistant. So I took these same reads and just ran a standard blast on them, which gave me nearly perfect identity to Naegleria Fowleri Karachi NF-001 (brain eating amoeba with 98% fatality). There is anecdotal evidence that this strain may be contagious.

I broke up the reference fasta to smaller pieces for Naegleria Fowleria and mapped against MTB-Oman and a significant number of reads were mapped. I also ran Kraken2 with the Mycobacterium database, and was showing very similar to the same hits I was getting against my reads.
I’m currently re-cleaning my original reads, as my original removal left numerous adapters as well as losing a large amount of reads. I switched to cutadapt using either Illumina Universal or Nextera Transpose, and they trimmed good. So now I have many more longer reads to work with.

Part of the issue I’m having, is that I can map Naegleri Fowleri NF-001,and assemble with MetaSpades with long contigs 4000-8000bp, and blast will give me nearly 98-100% ID match, but I also it matches to human in a similar 98-100%. Running Kraken2 of the NCBInt at 0.1 and 3 matches picks up Naegleri Fowleri and MTB-Oman.

Contagious - The original subject caught this from surgery. Blood samples from spouse and 2 dogs from one household, and parents of original subject living at different location all show similar species. Three of five dogs have died from this so far, a 4th is probably not far away. All individuals are of declining health since shortly after the surgery incident.

I have found this extremely difficult to diagnose, as none of the species are within the databases of the professional NGS classification services.

Topic		Replies	Views
compare Galaxy europe aligners (bowtie2, BWA and minimap2) and CLC genomics workbench usegalaxy.eu support mapping , blast	11	806	September 27, 2023
Diamond troubleshooting troubleshooting , mapping	7	246	February 9, 2024
Strategies for Mapping to multiple reference genomes? usegalaxy.eu support microgalaxy	1	33	February 19, 2025
Bowtie2 Mapping Issues usegalaxy.org support mapping	5	14	May 7, 2025
Issues with Read Assembly Using MEGAHIT and metaSPAdes in Galaxy galaxy-local	1	48	December 28, 2024

FASTQ Mapping to Reference Issues

Related topics