I’m not sure what to do with this. I have read to disable spliced alignment for DNA reads with Hisat2. Is this necessary or not??? I have my host removed and trimmed reads, and I run Hisat2 with disabled splice alignment against a reference. So this maps fine, then as a test on the unmapped reads I ran it again without disabling the spliced alignment and it mapped many more reads. I tested some of the reads both forward and reverse and blasted them on the NCBI Blast site, of the tested reads that I did these all should have mapped when I ran Hisat2 the first time. I did not check the insert size, but maybe there was a much larger insert on these reads??
Blast is not the same algoritm. Is it possible that the reads had a good match because it was a local alignment?
I wonder about using hisat2 with spliced alignment disabled. Could you not better use bowtie2 with preset Sensitive local (--sensitive-local) if you care about mapping more reads and not care about splicing?
I think I discovered my issue yesterday. I increased the maximum fragment length from 500 to 1000, that appears to pick up the reads that were being missed. I think hisat2 in spliced mode was picking up reads with a larger fragment size as spliced mode doesn’t have a limit on fragment size.
I do plan to use bowtie2 in local mode as well, but because my target genomes have high similarity to human dna, if doing local alignment for human host removal was removing too many target reads. So my plan was to do end to end first, then map my target reads, then continue again in local mode.
It sounds like you map against the human host reference in a separate step. Would it not be better to map against your human and target reference in one step because they are so similar? The mapper takes the best match per read. And afterwards you only contineu with the reads mapped against your target. If you do it seperate and a read has three mismatches with human it may still map and gets removed. While it maybe only has 1 mismatch with your target.
I dont know on top of my head now how to do it in galaxy but it is probably possible.
That’s what I would like to do actually, but I have no idea how to do it on galaxy. From BBtools the BBsplit program appears to be made for this, but it’s not on galaxy. I know I can download the program, but I have had no luck getting it to function.
It sounds like I could technically map them with Bowtie2 and use the BAM file, but no idea how to split the output.
This has been a problematic issue, as I initially had no idea what was in the samples (human blood), and using the recommended Bowtie2/Hisat2 Very Sensitive Local wipes out most of what I’m looking for. Then it’s been which Human reference hg38 or T2T, the hg38 didn’t produce good contigs with near perfect identity, so trying to work through again with the T2T.
The current problem was that even after mapping at very sensitive local, using hg38 and T2T, followed by mapping the genomes of interest, I still ended up with nearly 150mb of paired reads. De novo with metaspades was still assembling nearly complete contigs that matched my reference, this is why I was adjusting the maximum fragment size to see if that would pull out more reads.
After running bowtie2 against multiple references. I think you can split the reads with samtools view you select option “A filtered/subsampled selection of reads“ at the first parameter. And under “filter by regions“ you fill in the name of the reference you want to keep. Then to create a fasta or fastq again you can use the tool Samtools fastx. I did not tested this myself.