Hi everyone
I used Lordec to correct long reads with short reads, mapped the corrected reads to the yeast genome using minimap2 then called variants. I gave up on variant callers for long reads (clair3, longshot etc) as they called lots of false deletions at the beginning of a run of the same base. Then I used FreeBayes and got a set of variants that were clearly visible on IGV. The variants were not detected using Sanger sequencing so I compared bam files on IGV for long reads, short reads and corrected long reads (all mapped to the yeast genome using minimap2). The long-reads-corrected-using-short-reads file showed variants that were not in the long read or short read files and did not show variants that were clearly visible in the long read and short read files. Has anyone encountered this before? If so how did you resolve this problem?
Welcome, @Derek_Wilkinson
It sounds like you have isolated the problem: the error correction step didn’t perform well. Maybe try different parameters? Or, decide that you don’t need it?
More ideas:
- We have tutorials that include HIFI reads, and those have QA steps. See GTN Materials Search
- And there is one of the PacBio native tools available for mapping: pbmm2.
Others are welcome to add more comments!
Thanks for the advice. I will look at the material in the link. Also, I am working with the uncorrected long reads, trying different callers.
Regards,
Derek
1 Like