I’m running a workflow that I’ve used successfully in the past wherein I use FLASh to create mini-contigs from paired-end reads that overlap and then quality filter those contigs to remove reads that are <90% Q20.
Normally, when I do this, I’m only interested in the reads where overlaps are found and a contig can be assembled. However, FLASh also creates FASTQ files containing the non-overlapping reads from R1 and R2 respectively - most of the time I just disregard these.
For the experiment I’m analyzing currently, I am interested in carrying forward these non-overlapping reads as well. So, as a first step, I’m inputting these files into fastq_quality_filter (from the FastX toolkit) to remove low quality reads. The problem is that all of my FASTQs of the non-overlapping reads are failing at this step (while the overlapping contigs file is working fine). After a bit of digging, I realized that the reason that these particular files are terminating is that they all have at least one read in the FASTQ where the sequence line is empty. Also, interesting is that the missing value in the R1 file matches the missing value in the R2 file; both end read reads of the same pair seem to be gone and this is causing the filtering tool to abort.
Does anyone has any insight into why this might be happening? Could there be an easy fix to this? Right now I’m just contemplating trying to find a tool to sanitize the FASTQs so that any reads with missing lines just get dropped - does anyone have suggestions for how to do this?
Thanks for your help.