Random failures when concatenating fastqsanger.gz datasets via collection– "Not in GZIP format" errors

jennaj · July 1, 2025, 8:37pm

Yes, there might have been some issues but those should be resolved now that the release is finalized (we have to re-route some cluster resources, and it wasn’t perfect over the last few weeks). But I’m also wondering if there is a better way to do this in general.

The tools that seem to be better choices are these:

Nested Cross Product
Merge collections
Collapse Collection into single dataset in order of the collection

And, the new collection type:

Nested Collection (lists of lists)

This would work on the collection files directly using the element identifiers, without needing to open the files to read the sequence identifiers, and should be faster and maybe more reliable, especially if these are single-cell data with the really long > title lines. Uncompressing the data would be avoided entirely until the final collapse step.

The other option is try with uncompressed data throughout the early sorting steps when the files are being read repeatedly, then compressing the result at the very end.

Whatever you decide, if you want to share back some examples – maybe just the files with that part of the workflow, we can try to help to model this, and investigate the processing issues – if any remain – but do try again now since the cluster reconnections and the full release deployment just happened later yesterday.

Let’s start there!

Topic		Replies	Views
Tool for merging 2x single-read illumina sequencing files (fastq) into one? usegalaxy.org support text-manipulation , macs2 , fastqsanger , epigenetics	7	683	November 5, 2021
Problem in dowloading file in Galaxy Europe usegalaxy.eu support fastqsanger	1	235	November 23, 2022
Concatenate job not submitting usegalaxy.org support text-manipulation	1	490	February 20, 2020
Paird-end Fastq-dump Manipulation - Fastq De-Interlacer	3	2528	May 7, 2019
concatenate (corresponding) data sets in collections tool-search	1	821	December 19, 2018

Random failures when concatenating fastqsanger.gz datasets via collection– "Not in GZIP format" errors

Related topics