Hi Jennifer,
I already had a suspect issue with the sequencing to start. I had an expectation of Mycobacterium to be present within the samples. In my research it was clear that sequencing of Mycobacterium is problematic. #1 Illumina sequencing is known to be problematic with High GC bacteria, especially mycobacterium. #2 The extraction kit by Zymo Research also mentioned some issued with sequencing High GC bacteria, and recommends additional steps for High GC bacteria.
So I went into the sequencing anticipating some sort of issues, though it was much more problematic than I had anticipated. Since Mycobacterium has a high lipids content in their cell wall, standard procedures don’t adequately compensate for this, causing sequencing issues. I have read many published studies and reports regarding suggested additional steps to adequately clean Mycobacterium DNA.
For standard processing, I completely agree that standard processing procedures are likely adequate. But when having problematic microbes that you anticipate in your samples, and due to problem of getting the sequencing lab to adjust to your needs (without spending tons of money). My logic of rescuing reads makes sense, to me at least, if either a forward or a reverse read is of good quality, then it makes sense to keep it. Since the R2 is just the reverse complement of R1, it’s theoretically not adding or subtracting from the dataset. If there were only a small amount, it would make sense to ignore them, but especially in the initial sequencing done on the Novaseq 6000 the loss would have been massive 150MB compressed files are many millions of reads.
Can you answer one question that I’m confused on. If I am trying to rescue both forward and reverse reads, is there any reason to reverse complement R2 reads to concatenate with the R1 reads???