How do I create and visualize a nested paired list

I have data from 13 samples, each sequenced across 4 lanes in both directions (paired end). So each sample has 8 data files, or 4 pairs of datafiles.
I am trying to create a nested, paired collection. But when I use the Nested Paired builder, I just get a paired list without the nested samples. I put the sample name in the Outer List Identifier, but this doesn’t seem to do anything?

Welcome @Lauren_White

Yes, this can be a bit complicated!

The organization would be something like

mycollection

sampleA, sampleB, sampleN

where sampleA contains sampleA_1, sampleA_2, sampleA_N

where sampleA_1 contains sampleA_1_R1, sampleA_1_R2

where

outer list identifier = sampleA
inner list identifier = sampleA_1
element identifier = sampleA_1_R1, sampleA_1_R2 (paired reads)

I’m guessing that the inner list identifiers are the problem. Why? Each pair also needs a unique label. You can just number them. As long as both reads in that specific pair have the same inner identifier this should work. If all four reads had the same inner identifier, that would explain the result you had.

Yes

where sampleA contains sampleA_1, sampleA_2, sampleA_N
where sampleA_1 contains sampleA_1_R1, sampleA_1_R2

Not this

where sampleA contains sampleA_R1, sampleA_R2, sampleA_R1, sampleA_R2


The example in this tutorial is really nice to understand the concept – see section 5 exactly here → Hands-on: Rule Based Uploader: Advanced / Rule Based Uploader: Advanced / Using Galaxy and Managing your Data (#apply-rules-to-existing-collections)

In short, multiple list identifiers can be defined and they are applied in the order specified: outer → inner. The example in that part of the tutorial has three levels. If you are using the Build Collection wizard instead, then you can have two levels and both must have a unique identifier at each level.

The rest of the tutorial above is a useful guide for unpacking the data to send it to tools. The majority of tools will expected either List or List of Pairs for the processing. It is useful to load the data together at the start, then to use that master collection as the “source” for the downstream subsetting/processing.


Does this help? Would you like to share a bit more about your steps? Screenshots might be a good way to communicate for this. :slight_smile: