Deduplication for miRNA analysis

amir · September 21, 2021, 6:51am

Hello

What is the best tool for deduplication for my data?
lets say i have more than 50% duplication in my reads, how should i remove them.
I saw some tool but im not sure which one is best for my analysis.

Thanks

jennaj · September 21, 2021, 4:56pm

Hi @amir

Mapping the reads, marking duplicates in BAM results, then removing or ignoring them is one way. Samtools and Picard have tools to mark and/or remove duplicates.

Variant analysis tutorials have example workflows. For other protocols, search with the keyword “duplicates” or review the protocols themselves as many include QA/QC steps for addressing PCR duplicates. https://training.galaxyproject.org/

You can also search the tool panel – some tool suites have tools specific to the analysis goals, and not all are incorporated into a GTN tutorial (external documentation is usually linked on the tool form).

The question is pretty broad. If you have some specific analysis goal in mind, and can’t find help in the above resources, describe what you are doing more. Read protocol, planned analysis tools/goal, anything else special about your data (how the 50% duplicate rate was determined? if done already).

amir · September 26, 2021, 9:00pm

I am working on Small RNA seq (miRNA) and when I discard GC and duplication, My counts are dropping very much.
Specially when i correct GC percentage this will happening.
In miRNAs should i even do anything with GC and duplication level or let it be ?

Best.

gallardoalba · September 27, 2021, 10:13am

Hi @amir,
in the case of miRNA, I suggest you go on with the analysis without considering the GC and duplication content for this reasons:

miRNA sequences usually contain overrepresented motifs that could interfere with the real duplication values (because in many cases they have been originated from duplications).
The miRNA GC content is quite variable, and depending of the organism, the miRNAs can be enriched in those nucleotides (Lam et al., Mishra et at.).

What tool did you use to perform the alignment?

Regards

amir · September 27, 2021, 10:52am

I used Bowtie. Of course i used it twice, First i did exclude rRNAs and then again i used Bowtie, Not bowtie2.
Best

Topic		Replies	Views
what tools approprite for remove duplicate reads from BAM file in Usegalaxy mapping , usegalaxy , quality-control , picard_markduplicates	3	1943	April 19, 2021
Issues with UMI Tools deduplicate usegalaxy.org support tool-help , umi_tools_dedup	6	492	September 14, 2024
miRNA seq differential analysis usegalaxy.eu support workflow	0	576	August 20, 2020
DESeq2 with HISAT2 output? mapping , mirna	10	1052	April 29, 2022
Is there a protocol for miRNA Sequencing? usegalaxy.eu support mirna	3	795	January 25, 2022

Deduplication for miRNA analysis

Related topics