Remove PCR duplicates without mapping first

scca · September 24, 2024, 9:13pm

I have been trying to remove PCR duplicates from my reads without much success. Since this is a metagenomic screening assay, I do not have a reference sequence to map the reads to before duplicate removal (as required by UMITools-deduplicate and others).

I have a UMI sequence in each read and can get that sequence extracted to the FASTQ header using UMITools-extract. But I can’t seem to get past this point. All of the tools seem to only allow de-duplication after mapping, or only remove EXACT matches. Are there any ways to remove FASTQ duplicates based on UMI sequence only (allowing for one or two mismatches)? I have been looking at pRESTO CollapseSeq, but can’t figure out what to use for the different parameters. Would this tool work if I could figure out how to use it?

Topic		Replies	Views
Issues with UMI Tools deduplicate usegalaxy.org support tool-help , umi_tools_dedup	6	492	September 14, 2024
Need help trimming sample barcodes and sequencing adapters usegalaxy.org support single-cell	4	247	September 10, 2024
what tools approprite for remove duplicate reads from BAM file in Usegalaxy mapping , usegalaxy , quality-control , picard_markduplicates	3	1943	April 19, 2021
de-interleave issues usegalaxy.org support fastq-deinterlacer , third-party-identities	8	78	July 1, 2024
Alignment with UMI usegalaxy.eu support workflow , galaxy-local	0	335	February 25, 2022

Remove PCR duplicates without mapping first

Related topics