I’m trying to filter E.coli mapped reads from BAM samples derived from PE sequencing.
Using Bowtie2 on Galaxy, I aligned reads to E.coli K12 (locally installed genome) and selected --un-conc parameter to YES. This gave me two separate files containing R and L unmapped reads that I used as input for the alignment to hg38. Since I obtained very low overall alignment rate of the filtered reads, I blasted the unmapped ones over every genome and the output is 100% identity with E.coli genome (different strains from K12 but always E.coli). I tried also to blast one single unmapped read and the identity with K12 E.coli and others strain is 100%.
Can anyone help me with this issue? I really do not understand why reads like this can not be filtered out from my dataset even if they share the same sequence among different E.coli strains.