Multiple files after demultiplexing using radtags - what files do I use for mapping/SNP calling etc.,?

Hi there!
Thank you in advance for your help.
I am following the Le Bras tutorial. I am at the demultiplex step. I am trying to demultiplex pair-end reads from dd-RAD-Seq data (Hands-on: RAD-Seq Reference-based data analysis / RAD-Seq Reference-based data analysis / Ecology). I seem to have used Stacks RadTags to successfully demultiplex my data (24 barcodes). RadTags has produced 4 green-history files:

  1. Demultiplexed reads from data (4 files for each of the 24 samples)
  2. Remaining orphan reads from data (rem.1 and rem.2 files for each of the 24 samples)
  3. Stacks: process radtags discarded reads from data (two files R1 and R2)
  4. results.log with Stacks (overview showing 9971357 retained reads from a total of 12119366 sequences, with 1405312 ambiguous barcodes, and 742697 ambiguous RAD-tags).

I am now wanting to progress with the tutorial (e.g., quality control, mapping, SNP calling etc.,) but I do not know how to use the demultiplexed data in the following steps. I cannot seem to use the demultiplexed files for FastQC analyses etc., How do I use the demultiplexed files (do I need to combine them into one file etc., and if so, is there a tutorial for this)?

Thank you!
Ellie :slight_smile:

Welcome, @ElizabethSheldon

You probably just need to change the “shape” of your collection.

  • What you have now is a nested collection of pairs.
  • You can change this to/from a flattened collection of pairs with Collection Operations tools so that certain tools can understand the data, and better keep track of samples along with forward/reverse reads that may have the same labels (“element identifiers”) right now.

This tutorial covers how to do QA using these collection tools. Hands-on: Reference-based RNA-Seq data analysis / Reference-based RNA-Seq data analysis / Transcriptomics.

What to notice:

  1. Just the FastQC/MultiQC steps need the flattened version of the read collection data
  2. The actual trimming tools can consume a nested read collection data
  3. You can flatten the post-trimming read data to also run it through FastQC/MultiQC again to make sure the trimming did what you wanted it to
  4. Consider putting all those steps into a mini-workflow to make this less tedious!

More about collection folders → GTN Materials Search (query=collections)

Hope this helps! :slight_smile: These collection tools may seem complicated the first time but then you’ll learn how and it will become easier. It is part of the “analysis tax” of having so many tools, from so many different tool authors, available in one place. Meaning, sometimes minor adjustments to how the data is arranged is needed to make it all work together. Just know that manipulating collections doesn’t actually duplicate the data – everything references the original files – what you are changing are the labels and data structures only.