Hi there!
Thank you in advance for your help.
I am following the Le Bras tutorial. I am at the demultiplex step. I am trying to demultiplex pair-end reads from dd-RAD-Seq data (Hands-on: RAD-Seq Reference-based data analysis / RAD-Seq Reference-based data analysis / Ecology). I seem to have used Stacks RadTags to successfully demultiplex my data (24 barcodes). RadTags has produced 4 green-history files:
Demultiplexed reads from data (4 files for each of the 24 samples)
Remaining orphan reads from data (rem.1 and rem.2 files for each of the 24 samples)
Stacks: process radtags discarded reads from data (two files R1 and R2)
results.log with Stacks (overview showing 9971357 retained reads from a total of 12119366 sequences, with 1405312 ambiguous barcodes, and 742697 ambiguous RAD-tags).
I am now wanting to progress with the tutorial (e.g., quality control, mapping, SNP calling etc.,) but I do not know how to use the demultiplexed data in the following steps. I cannot seem to use the demultiplexed files for FastQC analyses etc., How do I use the demultiplexed files (do I need to combine them into one file etc., and if so, is there a tutorial for this)?
You probably just need to change the “shape” of your collection.
What you have now is a nested collection of pairs.
You can change this to/from a flattened collection of pairs with Collection Operations tools so that certain tools can understand the data, and better keep track of samples along with forward/reverse reads that may have the same labels (“element identifiers”) right now.
Hope this helps! These collection tools may seem complicated the first time but then you’ll learn how and it will become easier. It is part of the “analysis tax” of having so many tools, from so many different tool authors, available in one place. Meaning, sometimes minor adjustments to how the data is arranged is needed to make it all work together. Just know that manipulating collections doesn’t actually duplicate the data – everything references the original files – what you are changing are the labels and data structures only.