I’m trying to create a workflow for mutation calling that can handle working on a variable number of mutant strains in parallel. I’d like to be able to make a single collection of files as input and use rules to group my sequencing data according to the mutant strain. Then do the alignment and mutation calling group-wise according to those groups. It is this last bit that I can’t figure out. There does not seem to be anything in the collection operations section that would permit me to do this but I’m new to Galaxy so I’m probably missing something.
for example if I had input data files like this:
A123_0001_S1_R1_L001.fq.gz Group1
A123_0001_S1_R2_L001.fq.gz Group1
A123_0001_S1_R1_L002.fq.gz Group1
A123_0001_S1_R2_L002.fq.gz Group1
A123_0002_S2_R1_L001.fq.gz Group2
A123_0002_S2_R2_L001.fq.gz Group2
A123_0002_S2_R1_L002.fq.gz Group2
A123_0002_S2_R2_L002.fq.gz Group2
I’d like to create collection with pair information and with groups for each sample (S) for an arbitrary number of different sample groups. Then align and do mutation calling on each group in parallel ultimately returning a collection of VCF files with one file per group.
Is this possible? and if so i’d appreiciate some pointers on how to do it.