WGS Alignments that tolerate large, unknown, non-genomic insertions

I completed a WGS alignment using 150PE fastq reads via BowTie2 to a locally installed plant genome. The alignment worked well, and I am able to visualize the reads by Integrated Genome Viewer (IGV). However at my site of interest, I see that I have a large set of paired reads where the pair is not mapped. I am thinking that I have a non-reference sequence insertion at that site that is not tolerated by BowTie2 and is discarded during sequence processing, and there doesn’t seem to be an intuitive way to find my discarded reads of interest.

Is anyone aware of workflows on Galaxy that may tolerate non-genomic sequences better? Maybe de novo genome assembly workflows that can handle large genomes (400Mbp)? Apologies in advance if this is not a Galaxy question, but is instead one for a more bioinformatically-oriented forum.

1 Like

Hi @Dhruv_Patel

Reads that do not align with the reference can be output by the tool. See the option “Write unaligned reads (in fastq format) to separate file(s)” on the Bowtie2 tool form (near the top).

And you could certainly try assembly options to reconstruct any novel regions. Tutorials:

1 Like