How do I concatenate multiple fastq.gz files into a single fastq.gz file in Galaxy?
I am assembling a bacterial genome with oxford nanopore minION sequences, and I have 68 fastq.gz files that are all under the same barcode.
Under the “What type of data do you wish to concatenate?” menu, i have used the “single dataset” option, but it only concatenates two at a time producing many files.
I want to use the “paired collection” option to concatenate all files into a single file, but it says that “no compatible list of paired datasets available”.
Do I need to uncompress/compress all the files first? if so, do I have to compress/uncompress all 68 files individually?
To make sure I understand: you currently have a paired end collection of files in fastq.gz format. Each file is one or more sequences. You want concatenate these into a single pair of files, each with multiple sequences.
On each of the two outputs, run Collapse Collection
Then to combined the results into a paired collection again, run Zip collections
At the end, you can convert to a compressed format if you want to. Also, remember that most tools expect fastqsanger format, so that may be a better choice.
As a QA check, you could run a tool like Fastq Statistics to make sure the pairs are all still intact and in the right order.
Please give that a try and let us know if you need more help!