Exceeding account quota: How to find and manage all of your data

Henk · February 3, 2021, 8:00pm

I’m running various tools on a group of datasets (QC, adapter trimming, mapping etc.) and noticed that I’m running out of storage space rather quickly. I was wondering whether generating a collection list of paired-end runs uses the same amount of storage as all the individual files. Shouldn’t a collection list just be a shell for the original datasets?

Henk

jennaj · February 4, 2021, 12:13am

Hi @Henk

Yes, the collection contains clones of the original datasets and do not consume extra quota space. New work resulting from running tools will create new data, and those consume additional quota space.

Note: some tools require uncompressed inputs. Galaxy will create new uncompressed versions of data when needed at runtime if the original data was compressed. In some cases it is better to start with uncompressed data, or to permanently delete (purge) the compressed version after the uncompressed version is created.

This FAQ explains how to find and manage all of your data:

Galaxy Support - Galaxy Community Hub >> The account usage quota seems incorrect >> Account quotas - Galaxy Community Hub

Thanks!

Henk · February 4, 2021, 12:56pm

Dear Jenna,

What would you suggest is then the best solution to avoid unnecessary quota use if the files that I upload are, at the moment, .fastq.gz files?

Best regards,

Henk

jennaj · February 4, 2021, 6:16pm

Hi @Henk

Did the compressed fastq uncompress during your specific processing/tool choices?

If not, there is just one copy of the data, and you can either retain the starting data or purge it after it is not needed anymore to free up quota space. You can always download it as a backup. The FAQ I shared has many details about ways to do that.

If yes, then you could uncompress the starting data yourself within Galaxy then purge the compressed version to avoid the data duplication. Or, if not needed anymore, purge both.

Thanks!