producing gzip compressed datafiles

Zahra_Zangishei · May 5, 2020, 5:26pm

Hei

Does anyone know if there is a tool/method in Galaxy that can be used to compress a datafile as gzip???

jennaj · May 5, 2020, 8:15pm

Uncompressing gzip or .gz in Galaxy is more common than compressing again.

That said, many datatypes have an option to convert to a compressed version or format. For plain text data, that will most likely be bgzip in Galaxy and the file will end with a .bz2 extension if downloaded.

Click on the pencil icon for the dataset to reach the Edit Attributes forms. Click into the “Convert” tab and the pull-down the menu will list the options.

Some convert functions that are only performing a compression function are also under the tool panel section “Convert Formats”. Plus SAM to BAM and similar tools will be located here.

FAQ: https://galaxyproject.org/support/metadata/

Thanks!

Ps: Others are welcome to add more help! I recall this functionality being discussed but am not aware of current implementations that perform just this single function (any.data > any.data.gz). Could have missed it or is maybe a work-in-progress.

Zahra_Zangishei · May 5, 2020, 10:20pm

Thanks for your reply.
I actually did a BAM to FASTA conversion that resulted in a big unzipped output. Now, I need to download and upload it in another server. But there is the upload limitation of 1G.
Regarding “Convert” tab of Edit Attributes, I have already checked it but I could not find the proper option. I checked tools under the panel section “Convert Formats” and I could not find any, as well.
I thought “Text reformatting with awk” might be helpful and I tried to run it by " gzip" or " gzip -c". However, the resulting output is not a compressed file.
I am wondering if there is still a solution!

jennaj · May 7, 2020, 4:48pm

A lot of information is lost in this conversion. What is your goal? What tool are you using for mapping?

Several mapping tools will output unique original sequences fastq sequences, mapped or unmapped. Most also allow for output filtering (pass map filtering criteria, directly), so a distinct filter step doesn’t need to be necessarily be done after mapping. Result could be: fastqsanger outputs from the mapping step, as well as BAM. More meaningful content than fasta – assuming you started with fastq reads.

If the total working space is 1 GB, that is a bit limiting. You need room for not just the data, but tool runs and outputs.

If the upload is 1 GB per transaction, few choices:

Download the data from Galaxy, compress locally, upload to the other site (not clear if this is another Galaxy or not).
Split the fasta into multiple files then combine once at the other site. If both websites are Galaxy, this can be done without the need to download.

Topic		Replies	Views
local galaxy - how to unzip files galaxy-local	4	2658	November 11, 2019
How can I gunzip a file? usegalaxy.eu support fastqsanger	2	301	March 10, 2024
Trouble Uploading Large Files usegalaxy.org support upload	4	34	June 3, 2025
Converting Fastq.gz usegalaxy.org support server-admin	11	5644	July 13, 2020
downloaded fastq.gz file does not open/extract in Windows usegalaxy.org support fastqgz , fastqsanger	3	4018	March 11, 2020

producing gzip compressed datafiles

Related topics