Bowtie2 filtering reads

Hi everyone,

I’m trying to filter E.coli mapped reads from BAM samples derived from PE sequencing.

Using Bowtie2 on Galaxy, I aligned reads to E.coli K12 (locally installed genome) and selected --un-conc parameter to YES. This gave me two separate files containing R and L unmapped reads that I used as input for the alignment to hg38. Since I obtained very low overall alignment rate of the filtered reads, I blasted the unmapped ones over every genome and the output is 100% identity with E.coli genome (different strains from K12 but always E.coli). I tried also to blast one single unmapped read and the identity with K12 E.coli and others strain is 100%.
Can anyone help me with this issue? I really do not understand why reads like this can not be filtered out from my dataset even if they share the same sequence among different E.coli strains.

Thank you,

Irene

Hi @bioiz

Bowtie2 finds exact matches to the target.

You could try the reverse to remove the reads. Meaning, map to the human genome. Contamination should fall out into the unmapped.

There is more going on with the Bowtie2 mapping than just a simple identity match. It is too much to list out here but I can let you know that BLAST is different and can find all hits above just an identity match. These tools would work this way when run in Galaxy or otherwise.

Maybe review all of these data together in a genome browser? You might be able to spot why Bowtie2 didn’t capture the hit. IGV is one popular choice.