Multiple files after demultiplexing using radtags - what files do I use for mapping/SNP calling etc.,?

ElizabethSheldon · May 30, 2024, 8:46am

Hi there!
Thank you in advance for your help.
I am following the Le Bras tutorial. I am at the demultiplex step. I am trying to demultiplex pair-end reads from dd-RAD-Seq data (Hands-on: RAD-Seq Reference-based data analysis / RAD-Seq Reference-based data analysis / Ecology). I seem to have used Stacks RadTags to successfully demultiplex my data (24 barcodes). RadTags has produced 4 green-history files:

Demultiplexed reads from data (4 files for each of the 24 samples)
Remaining orphan reads from data (rem.1 and rem.2 files for each of the 24 samples)
Stacks: process radtags discarded reads from data (two files R1 and R2)
results.log with Stacks (overview showing 9971357 retained reads from a total of 12119366 sequences, with 1405312 ambiguous barcodes, and 742697 ambiguous RAD-tags).

I am now wanting to progress with the tutorial (e.g., quality control, mapping, SNP calling etc.,) but I do not know how to use the demultiplexed data in the following steps. I cannot seem to use the demultiplexed files for FastQC analyses etc., How do I use the demultiplexed files (do I need to combine them into one file etc., and if so, is there a tutorial for this)?

Thank you!
Ellie

jennaj · June 6, 2024, 10:45pm

Welcome, @ElizabethSheldon

You probably just need to change the “shape” of your collection.

What you have now is a nested collection of pairs.
You can change this to/from a flattened collection of pairs with Collection Operations tools so that certain tools can understand the data, and better keep track of samples along with forward/reverse reads that may have the same labels (“element identifiers”) right now.

This tutorial covers how to do QA using these collection tools. Hands-on: Reference-based RNA-Seq data analysis / Reference-based RNA-Seq data analysis / Transcriptomics.

What to notice:

Just the FastQC/MultiQC steps need the flattened version of the read collection data
The actual trimming tools can consume a nested read collection data
You can flatten the post-trimming read data to also run it through FastQC/MultiQC again to make sure the trimming did what you wanted it to
Consider putting all those steps into a mini-workflow to make this less tedious!

More about collection folders → GTN Materials Search (query=collections)

Hope this helps! These collection tools may seem complicated the first time but then you’ll learn how and it will become easier. It is part of the “analysis tax” of having so many tools, from so many different tool authors, available in one place. Meaning, sometimes minor adjustments to how the data is arranged is needed to make it all work together. Just know that manipulating collections doesn’t actually duplicate the data – everything references the original files – what you are changing are the labels and data structures only.

Topic		Replies	Views
Stacks2: process_Radtags won't find paired-end data usegalaxy.eu support	16	890	March 1, 2021
demultiplexing in stacks gives empty files usegalaxy.eu support	0	310	January 12, 2021
>90% of reads lost during demultiplexing with Barcode Splitter and Process Radtag tools usegalaxy.org support troubleshooting	4	129	July 10, 2024
De-multiplex paired-end reads usegalaxy.org support collections , tool-help , __apply_rules__	1	27	October 30, 2024
regarding MULTIQC not working usegalaxy.eu support multiqc , collections , quality-control	4	647	July 26, 2022

Multiple files after demultiplexing using radtags - what files do I use for mapping/SNP calling etc.,?

Related topics