Removing PCR duplicates in GStacks2, galaxy.eu

hello everyone,

i´m sorry I´ve been away for too long.

thank you for the reply, yes it´s the same tool I´m using, However I do not have a reference genome. My data have to be set for “the novo mode”, and within it´s options I do not see
PCR duplicate removal.

do you have suggestions?

thanks

Silvia Bettenocurt

Hi @Silvia

This is a good followup question, so I’m glad you asked!

The duplicate removal is possible with gstacks when there is a reference because the reads are first mapped to it (and is why the input is an aligned BAM, not fastq). This sets up a coordinate system where “exactly repeated” mapping characteristics can then be filtered out as PCR duplicates.

With denovo, there isn’t a baseline coordinate mapping to use, and gstacks isn’t the tool choice.

See the guide here

then, the FAQs link has Stacks: Stacks: Frequently Asked Questions

What are the input and output data formats for Stacks?

In the de novo case, data is read by the ustacks program and it currently can read either FASTA, FASTQ, or BAM formats. When a reference genome is available, aligned data is read by the gstacks program and either SAM or BAM formats can be input.

The tool is in Galaxy as one of these

  • Stacks2: ustacks Identify unique stacks

  • Stacks2: de novo map the Stacks pipeline without a reference genome

  • Stacks: ustacks align short reads into stacks

  • Stacks: de novo map the Stacks pipeline without a reference genome (denovo_map.pl)(Galaxy Version 1.46.0)

The protocol is in a publication (paywalled :upside_down_face:) but we don’t have a dedicated Galaxy tutorial I can point you to. Instead, try searching online to see if anyone has broken this out if you can’t see the paper. The core steps will be about the same in Galaxy – the difference is usually just how to set the metadata such as datatypes, and these are all fastq, BAM, tabular datatypes, which are common across many tools.

If you are new to Galaxy, consider running though a Learning Pathway like this to get familiar with how to organize data and navigate around the interface. And, if you are already familiar with Bioinformatics analysis, you can simply consider this a reference. → Learning Pathway: Introduction to Galaxy and Sequence analysis.

Hope this helps again! :slight_smile:

Good afternoon.

Thank you so much for the inputs. I guess I’ll give it a try to move forward without removing pct duplicates and see what happens. I can always return and reanalyse from scratch with other stacks parameters. Thank you :slight_smile:

Silvia

Enviada do Yahoo Mail para iPhone

No dia sábado, outubro 18, 2025, 00:49, Jennifer Hillman-Jackson via Galaxy Community Help notifications@galaxy.discoursemail.com escreveu:

Hi @Silvia

Your question was on my mind .. so I ran some more searches against online discussions to confirm my observations about how the tools are intended to be used, and didn’t find anything new of consequence. That said, removing PCR duplicates had an interesting discussion at Biostars where the investigator was avoiding PCR removals! So, mixed advice but maybe helpful. You could also ask a new question there to see what happens – that forum reaches more scientists out in the wild who may have more to add. The tools in Galaxy are still the original tools, so decisions about parameters would apply. :slight_smile: