Need help flattening collections!

bahooper · April 11, 2025, 1:39am

Hello! I am a student who is new to Galaxy and I’m having trouble flattening my dataset. I have a history with all the datasets and 4 separate, smaller datasets histories based on certain criteria (basically I categorized them). I am supposed to flatten each separate dataset to do quality control, but I’m confused on if I need to make them collections first? They are single-ended not paired, so I wouldn’t need any paired collections. I would appreciate any tips or suggestions! Thank you.

wm75 · April 11, 2025, 8:16am

I assume you mean concatenate instead of flatten?
If so, this tool Concatenate datasets tail-to-head (cat) is an option.

jennaj · April 11, 2025, 4:58pm

To clarify a bit more here: QA tools like FastQC and MultiQC expect unique sample identifiers to generate then combine statistics into a summary.

Some collection folder formats do need to be “flattened” for this purpose. In particular, paired end collections benefit from the restructuring and explicit naming of forward/reverse reads. You might also want to rename to designate the processing state (raw versus trimmed).

For single end data, creating a simple list collection format for the forward reads is enough (no need to “flatten”) but you might still want to rename, to designate the processing state.

Getting the data into a collection folder is usually the first step. The collection holds multiple samples, each with a unique sample name (collection “identifier”). You can tag these samples to layer in extra metadata, or create different collections per metadata group (example: control versus treated).

Resources

This topic is a good summary for QA → multQC issue and guidance? - #2 by jennaj
Tutorials for collection operations → Hands-on: Using dataset collections / Using dataset collections / Using Galaxy and Managing your Data
Many tutorials include collections and we have FAQs, too. → GTN Materials Search (query=collection)
Is this single cell analysis? If so, we have tutorials here → Single Cell / Tutorial List

In short: if you need to combine multiple fastq datasets into one fastq dataset (all the reads are from the same sample), then @wm75’s advice is how you will want to proceed. You could also put all of the files for a single sample into a list collection then run Collapse Collection. Do that for each sample, then use Merge collections to organize samples for conditions, or use Apply Rules to add in tags.

How to use all of these tools are in the first link above. If you want to explain more or share what you have so far, we can try to help more.

Topic		Replies	Views
how to create a paired dataset collection from the files which are already paired collections	11	1042	July 28, 2023
multQC issue and guidance? multiqc , collections , resources , quality-control , fastqc	1	37	March 3, 2025
regarding MULTIQC not working usegalaxy.eu support multiqc , collections , quality-control	4	644	July 26, 2022
Paird-end Fastq-dump Manipulation - Fastq De-Interlacer	3	2519	May 7, 2019
Concatenate multiple datasets tool; combining all fastq.gz files under a single barcode to one fastq.gz file collections , tool-help	5	103	April 3, 2025

Need help flattening collections!

Related topics