Concatenate multiple datasets tool; combining all fastq.gz files under a single barcode to one fastq.gz file

How do I concatenate multiple fastq.gz files into a single fastq.gz file in Galaxy?

I am assembling a bacterial genome with oxford nanopore minION sequences, and I have 68 fastq.gz files that are all under the same barcode.

Under the “What type of data do you wish to concatenate?” menu, i have used the “single dataset” option, but it only concatenates two at a time producing many files.

I want to use the “paired collection” option to concatenate all files into a single file, but it says that “no compatible list of paired datasets available”.

Do I need to uncompress/compress all the files first? if so, do I have to compress/uncompress all 68 files individually?

if it helps, I am using this tutorial to assemble the genome, which uses a single concatenated fastq.gz file to start

Welcome @albert_burkle

To make sure I understand: you currently have a paired end collection of files in fastq.gz format. Each file is one or more sequences. You want concatenate these into a single pair of files, each with multiple sequences.

If that is all true, then the Collection Operation tools can be used instead. → Hands-on: Using dataset collections / Using dataset collections / Using Galaxy and Managing your Data

  1. Start with your paired-end collection
  2. Convert the dataype to plain text fastq → FAQ: Changing the datatype of a collection
  3. Run Unzip collection
  4. On each of the two outputs, run Collapse Collection
  5. Then to combined the results into a paired collection again, run Zip collections
  6. At the end, you can convert to a compressed format if you want to. Also, remember that most tools expect fastqsanger format, so that may be a better choice.
  7. As a QA check, you could run a tool like Fastq Statistics to make sure the pairs are all still intact and in the right order.

Please give that a try and let us know if you need more help! :slight_smile:

Hi Jenna!

thank you for the help. I couldnt figure out how to change the files to fast

q, or unzip the collection, but I just ran the Collapse Collection tool and it worked perfectly!

Does that mean my data not paired-end collection?

Hi @albert_burkle

You can check your output with the last tool in that listing.

However, I’m not sure if the collapse will work correctly if all the data is compressed already.

You can share back your history is anything still seems odd.