Hi Jennifer,
I already had a suspect issue with the sequencing to start. I had an expectation of Mycobacterium to be present within the samples. In my research it was clear that sequencing of Mycobacterium is problematic. #1 Illumina sequencing is known to be problematic with High GC bacteria, especially mycobacterium. #2 The extraction kit by Zymo Research also mentioned some issued with sequencing High GC bacteria, and recommends additional steps for High GC bacteria.
So I went into the sequencing anticipating some sort of issues, though it was much more problematic than I had anticipated. Since Mycobacterium has a high lipids content in their cell wall, standard procedures don’t adequately compensate for this, causing sequencing issues. I have read many published studies and reports regarding suggested additional steps to adequately clean Mycobacterium DNA.
For standard processing, I completely agree that standard processing procedures are likely adequate. But when having problematic microbes that you anticipate in your samples, and due to problem of getting the sequencing lab to adjust to your needs (without spending tons of money). My logic of rescuing reads makes sense, to me at least, if either a forward or a reverse read is of good quality, then it makes sense to keep it. Since the R2 is just the reverse complement of R1, it’s theoretically not adding or subtracting from the dataset. If there were only a small amount, it would make sense to ignore them, but especially in the initial sequencing done on the Novaseq 6000 the loss would have been massive 150MB compressed files are many millions of reads.
Can you answer one question that I’m confused on. If I am trying to rescue both forward and reverse reads, is there any reason to reverse complement R2 reads to concatenate with the R1 reads???
Do you mean to attempt to recreate a pair? You will need to disclose what you have done in the sample notes if others will be okay with – or expect in some situations – data pre-processed that way. That archive would be the best people to advise you through this, especially if what happened is common for that species. At a minimum you would need to disclose in the sample description what you did in the way other people usually disclose the same process, yes?
If this was just data for yourself, you could of course do whatever you want and see what happens, then explain in your publication of the results, but when publishing data to a sequence archive, I think there are larger considerations. Whoever is later using those reads needs to know where they came from.
Hi Jennifer,
This data is for personal use at this time. Though from what I have found so far, the right persons may have interest in my findings. For example: 1. A person/animal with Malaria infection, the higher likelihood of a coinfection with Mycobacterium as well as numerous other bacterial species. 2. Malaria is not considered contagious, yet I can show through numerous samples that I’ve done that it is indeed contagious person-to-person as well as person-to-animal (canine anyways).
Patient #1 contracted unknown infection from surgery at hospital. Doctors refused to try to diagnose. 16S sequencing shows a wide range of abnormal bacteria in blood and urine. Shotgun sequencing shows massive amount of Plasmodium Ovale (maybe smaller amounts of others), as well as massive amounts of Atypical Mycobacterium, and other bacteria in whole blood and CSF leaking from nose of Patient #1, also found in Spouse and now deceased dog of Patient #1. Repeat testing 6 months later showed the same infection in Patient #1, also now in another dog from Patient #1, and both parents of Patient #1 (living separately). So I’m dealing with a highly contagious disease that doctors are completely ignoring.
As for the recreating a pair from a single read, I have heard the question asked before, and the response was that’s not possible. Though from what I have done so far, it does seem to work with the method that I used. Now for those experiments that are sequencing a single species, it makes sense to discard anything of lower quality. In my exercise, I want to know EVERYTHING that’s in the sample as well as possible.