Deduplication for miRNA analysis


What is the best tool for deduplication for my data?
lets say i have more than 50% duplication in my reads, how should i remove them.
I saw some tool but im not sure which one is best for my analysis.


1 Like

Hi @amir

Mapping the reads, marking duplicates in BAM results, then removing or ignoring them is one way. Samtools and Picard have tools to mark and/or remove duplicates.

Variant analysis tutorials have example workflows. For other protocols, search with the keyword “duplicates” or review the protocols themselves as many include QA/QC steps for addressing PCR duplicates.

You can also search the tool panel – some tool suites have tools specific to the analysis goals, and not all are incorporated into a GTN tutorial (external documentation is usually linked on the tool form).

The question is pretty broad. If you have some specific analysis goal in mind, and can’t find help in the above resources, describe what you are doing more. Read protocol, planned analysis tools/goal, anything else special about your data (how the 50% duplicate rate was determined? if done already).

I am working on Small RNA seq (miRNA) and when I discard GC and duplication, My counts are dropping very much.
Specially when i correct GC percentage this will happening.
In miRNAs should i even do anything with GC and duplication level or let it be ?


Hi @amir,
in the case of miRNA, I suggest you go on with the analysis without considering the GC and duplication content for this reasons:

  • miRNA sequences usually contain overrepresented motifs that could interfere with the real duplication values (because in many cases they have been originated from duplications).
  • The miRNA GC content is quite variable, and depending of the organism, the miRNAs can be enriched in those nucleotides (Lam et al., Mishra et at.).

What tool did you use to perform the alignment?



I used Bowtie. Of course i used it twice, First i did exclude rRNAs and then again i used Bowtie, Not bowtie2.