Convert Single End Read to Paired End??

Jon_Colman · October 17, 2024, 9:55pm

So I have some sequences done on the Novaseq 6000, where a large portion of R2 reads are all Poly-g, or I lose one of R1 or R2 after trimming. I am wondering the possibility of taking the unpaired reads, and using the reverse complement to make it a paired read???

Essentially taking the “unpaired” GOOD reads, reverse complementing them, use FASTQ to SAM to set the forward and reverse reads, and finally using SAM to FASTQ to put them back into an R1/R2 orientation that I can concatenate onto the rest of my reads.

In my head this makes sense, as I recall Trimmomatic does somewhat the same thing for overlapping pairs?

igor · October 19, 2024, 10:51pm

Hi @Jon_Colman,

Technically, you can produce reverse complement sequences, but I do not recommend it. A proper approach might be use of good PE and SE data.

I am not aware about this feature in Trimmomatic.

Kind regards,
Igor

Jon_Colman · October 20, 2024, 3:13am

The problem is that I have a massive amount of reads, say 100mb compressed that isn’t paired due to PolyG tails. Cloud based platforms either want paired, or single, not paired with unpaired singles.

Jon_Colman · October 20, 2024, 10:17pm

I have a mapping question Igor. I have had issues with Host removal, in that it is removing a LOT of the species reads that I’m looking for. One of which is Plasmodium (malaria), which I know at least part of the genome is in the 95%+ matching to Human. Is there a galaxy program and/or settings that I can try to only get 100% match to human (I assume the newer T2T reference is probably best). I don’t mind if I miss some of the human reads, I just don’t want to miss microbial reads.

Jon

igor · October 21, 2024, 12:37am

Some protocols can tolerate PE and SE data in a single alignment. You can map PE and SE separately and merge the BAM files.

100mb - it depends on context. If you have 2x4gb PE files, 0.1tb is rather small.

Generally, people get excessive amount of data in these days, so loosing some may not be a big deal, but it depends on individual situation.

igor · October 21, 2024, 1:02am

0.1gb sorry for the typo.

I am surprised with sequences having such strong similarity, 95%, between the human and Plasmodium genomes. If many reads are filtered the sequence(s) must of a reasonable length. I am not saying it is impossible, but I recollect multiple stories about foreign sequences present in genome assemblies in early days of genomics.

You can increase cost of mismatch and gap in advanced settings of the aligner used for read mapping, but it is double edged sward. You will get many human reads as unmapped.

Kind regards,
Igor

Jon_Colman · October 21, 2024, 1:49am

The problem with my files, is from what I can see, its the microbes with the bad reads, whereas the host reads look good. I have changed my sequencing methods to get better quality, but still trying to salvage these.

Topic		Replies	Views
Trinity error in usegalaxy.org site usegalaxy.org support	6	1371	April 15, 2019
FastQC fails with Trimmomatic trimmed, unpaired reads usegalaxy.org support	0	486	August 19, 2021
Should or can paired-end fastq be converted to single-end fastq -- and the reverse? usegalaxy.org support fastq-format	5	3323	July 3, 2019
Filtering out host genomic sequences from Illumina paired-end reads usegalaxy.org support mapping	3	942	March 11, 2022
RNA-seq reads to counts with pair-end data usegalaxy.eu support workflow	1	311	February 28, 2023

Convert Single End Read to Paired End??

Related topics