producing gzip compressed datafiles


Does anyone know if there is a tool/method in Galaxy that can be used to compress a datafile as gzip???

1 Like

Hello @Zahra_Zangishei

Uncompressing gzip or .gz in Galaxy is more common than compressing again.

That said, many datatypes have an option to convert to a compressed version or format. For plain text data, that will most likely be bgzip in Galaxy and the file will end with a .bz2 extension if downloaded.

Click on the pencil icon for the dataset to reach the Edit Attributes forms. Click into the “Convert” tab and the pull-down the menu will list the options.

Some convert functions that are only performing a compression function are also under the tool panel section “Convert Formats”. Plus SAM to BAM and similar tools will be located here.



Ps: Others are welcome to add more help! I recall this functionality being discussed but am not aware of current implementations that perform just this single function ( > Could have missed it or is maybe a work-in-progress.

Thanks for your reply.
I actually did a BAM to FASTA conversion that resulted in a big unzipped output. Now, I need to download and upload it in another server. But there is the upload limitation of 1G.
Regarding “Convert” tab of Edit Attributes, I have already checked it but I could not find the proper option. I checked tools under the panel section “Convert Formats” and I could not find any, as well.
I thought “Text reformatting with awk” might be helpful and I tried to run it by " gzip" or " gzip -c". However, the resulting output is not a compressed file.
I am wondering if there is still a solution!

A lot of information is lost in this conversion. What is your goal? What tool are you using for mapping?

Several mapping tools will output unique original sequences fastq sequences, mapped or unmapped. Most also allow for output filtering (pass map filtering criteria, directly), so a distinct filter step doesn’t need to be necessarily be done after mapping. Result could be: fastqsanger outputs from the mapping step, as well as BAM. More meaningful content than fasta – assuming you started with fastq reads.

If the total working space is 1 GB, that is a bit limiting. You need room for not just the data, but tool runs and outputs.

If the upload is 1 GB per transaction, few choices:

  1. Download the data from Galaxy, compress locally, upload to the other site (not clear if this is another Galaxy or not).
  2. Split the fasta into multiple files then combine once at the other site. If both websites are Galaxy, this can be done without the need to download.