Hei
Does anyone know if there is a tool/method in Galaxy that can be used to compress a datafile as gzip???
Hei
Does anyone know if there is a tool/method in Galaxy that can be used to compress a datafile as gzip???
Hello @Zahra_Zangishei
Uncompressing gzip
or .gz
in Galaxy is more common than compressing again.
That said, many datatypes have an option to convert to a compressed version or format. For plain text data, that will most likely be bgzip
in Galaxy and the file will end with a .bz2
extension if downloaded.
Click on the pencil icon for the dataset to reach the Edit Attributes forms. Click into the “Convert” tab and the pull-down the menu will list the options.
Some convert functions that are only performing a compression function are also under the tool panel section “Convert Formats”. Plus SAM to BAM
and similar tools will be located here.
FAQ: https://galaxyproject.org/support/metadata/
Thanks!
Ps: Others are welcome to add more help! I recall this functionality being discussed but am not aware of current implementations that perform just this single function (any.data
> any.data.gz
). Could have missed it or is maybe a work-in-progress.
Thanks for your reply.
I actually did a BAM to FASTA conversion that resulted in a big unzipped output. Now, I need to download and upload it in another server. But there is the upload limitation of 1G.
Regarding “Convert” tab of Edit Attributes, I have already checked it but I could not find the proper option. I checked tools under the panel section “Convert Formats” and I could not find any, as well.
I thought “Text reformatting with awk” might be helpful and I tried to run it by " gzip" or " gzip -c". However, the resulting output is not a compressed file.
I am wondering if there is still a solution!
A lot of information is lost in this conversion. What is your goal? What tool are you using for mapping?
Several mapping tools will output unique original sequences fastq sequences, mapped or unmapped. Most also allow for output filtering (pass map filtering criteria, directly), so a distinct filter step doesn’t need to be necessarily be done after mapping. Result could be: fastqsanger
outputs from the mapping step, as well as BAM
. More meaningful content than fasta
– assuming you started with fastq
reads.
If the total working space is 1 GB, that is a bit limiting. You need room for not just the data, but tool runs and outputs.
If the upload is 1 GB per transaction, few choices: