Hello! I am a student who is new to Galaxy and I’m having trouble flattening my dataset. I have a history with all the datasets and 4 separate, smaller datasets histories based on certain criteria (basically I categorized them). I am supposed to flatten each separate dataset to do quality control, but I’m confused on if I need to make them collections first? They are single-ended not paired, so I wouldn’t need any paired collections. I would appreciate any tips or suggestions! Thank you.
I assume you mean concatenate instead of flatten?
If so, this tool Concatenate datasets tail-to-head (cat) is an option.
To clarify a bit more here: QA tools like FastQC and MultiQC expect unique sample identifiers to generate then combine statistics into a summary.
Some collection folder formats do need to be “flattened” for this purpose. In particular, paired end collections benefit from the restructuring and explicit naming of forward/reverse reads. You might also want to rename to designate the processing state (raw versus trimmed).
For single end data, creating a simple list collection format for the forward reads is enough (no need to “flatten”) but you might still want to rename, to designate the processing state.
Getting the data into a collection folder is usually the first step. The collection holds multiple samples, each with a unique sample name (collection “identifier”). You can tag these samples to layer in extra metadata, or create different collections per metadata group (example: control versus treated).
Resources
- This topic is a good summary for QA → multQC issue and guidance? - #2 by jennaj
- Tutorials for collection operations → Hands-on: Using dataset collections / Using dataset collections / Using Galaxy and Managing your Data
- Many tutorials include collections and we have FAQs, too. → GTN Materials Search (query=collection)
- Is this single cell analysis? If so, we have tutorials here → Single Cell / Tutorial List
In short: if you need to combine multiple fastq datasets into one fastq dataset (all the reads are from the same sample), then @wm75’s advice is how you will want to proceed. You could also put all of the files for a single sample into a list collection then run Collapse Collection. Do that for each sample, then use Merge collections to organize samples for conditions, or use Apply Rules to add in tags.
How to use all of these tools are in the first link above. If you want to explain more or share what you have so far, we can try to help more.