Removing PCR duplicates in GStacks2, galaxy.eu

Silvia · October 15, 2025, 2:34pm

hello everyone,

i´m sorry I´ve been away for too long.

thank you for the reply, yes it´s the same tool I´m using, However I do not have a reference genome. My data have to be set for “the novo mode”, and within it´s options I do not see
PCR duplicate removal.

do you have suggestions?

thanks

Silvia Bettenocurt

jennaj · October 18, 2025, 12:39am

Hi @Silvia

This is a good followup question, so I’m glad you asked!

The duplicate removal is possible with gstacks when there is a reference because the reads are first mapped to it (and is why the input is an aligned BAM, not fastq). This sets up a coordinate system where “exactly repeated” mapping characteristics can then be filtered out as PCR duplicates.

With denovo, there isn’t a baseline coordinate mapping to use, and gstacks isn’t the tool choice.

See the guide here

then, the FAQs link has Stacks: Stacks: Frequently Asked Questions

What are the input and output data formats for Stacks?

In the de novo case, data is read by the ustacks program and it currently can read either FASTA, FASTQ, or BAM formats. When a reference genome is available, aligned data is read by the gstacks program and either SAM or BAM formats can be input.

The tool is in Galaxy as one of these

Stacks2: ustacks Identify unique stacks
Stacks2: de novo map the Stacks pipeline without a reference genome
Stacks: ustacks align short reads into stacks
Stacks: de novo map the Stacks pipeline without a reference genome (denovo_map.pl)(Galaxy Version 1.46.0)

The protocol is in a publication (paywalled ) but we don’t have a dedicated Galaxy tutorial I can point you to. Instead, try searching online to see if anyone has broken this out if you can’t see the paper. The core steps will be about the same in Galaxy – the difference is usually just how to set the metadata such as datatypes, and these are all fastq, BAM, tabular datatypes, which are common across many tools.

If you are new to Galaxy, consider running though a Learning Pathway like this to get familiar with how to organize data and navigate around the interface. And, if you are already familiar with Bioinformatics analysis, you can simply consider this a reference. → Learning Pathway: Introduction to Galaxy and Sequence analysis.

Hope this helps again!

Topic		Replies	Views
Long time running Stacks de novo map usegalaxy.eu support server-admin , stacks	4	783	July 15, 2019
Trouble using a list as input in Stacks: de_novo_map usegalaxy.eu support stacks	2	1015	January 28, 2019
HELP for ddRAD de novo map analysis usegalaxy.eu support gtn-tutorial , galaxy-local , stacks	0	568	May 2, 2019
Trimming before Stacks usegalaxy.eu support	2	485	March 1, 2021
Stack:population how exclude monomorphic loci? usegalaxy.eu support stacks , variant-analysis	0	502	May 14, 2019

Removing PCR duplicates in GStacks2, galaxy.eu

What are the input and output data formats for Stacks?

Related topics