Data file storage

Dear Galaxy
I have uploaded a 150GB transcriptome dataset. How long will it be available in my history?
Thanks

Welcome @Chandima1

We have a guide here with many details. → What should I do if my data exceeds the given 250GB of storage?

For the exact details, you will need to review the terms of service at the server where you are working. All will have slightly different data retention times.

In general, it is a good idea to save back a copy of anything important to a storage location where you have full control (backups secured). See → FAQ: How can I reduce quota usage while still retaining prior work (data, tools, methods)?

If you have followup questions about a specific server, you can share back the URL and we can try to clarify about it more (will need to be public).

Hope this helps! :slight_smile:

Dear Jenna
Thank you for the warning!
I had the impression that 1T storage is available for active jobs for 30 days. May be I am wrong. I am running Trimmomatic. Can I download the history once the job is completed and free the 250gb? Please clarify me on this. Thank you

Hi @Chandima1

Yes, the extra 1 TB of storage space at the UseGalaxy.org server is for 30 days then the data is automatically deleted. This operates on a rolling per-file basis from the date that a dataset is first created.

If you move data to or create it directly in the 250 GB of “permanent” storage space, then it will not be deleted on a schedule. The ORG server will keep this data for an indefinite period of time (baring a server data accident of some kind).

You can download data from either of these storage locations using the same methods, and purge data to free up space from either location. Your default 250 GB must be under quota for new jobs to execute, no matter where that output is writing into.

The language is tricky since we can’t really guarantee data permanence the same way paid data storage products can. It is simply too much data for us to fully replicate it across the right kind of raid hardware. We will try to keep data in the 250 of permanent storage “forever” but we do NOT keep a backup of everything people might miss or find hard to recreate. That is what the disclaimers here are about. → https://usegalaxy.org/static/terms.html.

The public Galaxy servers are best for active project work, not data storage. You can and should offload static and important data to a paid cloud or local data storage location that you have full control over. An S3 bucket you own is one example. Google cloud can be another as are some forms of DropBox.

If the data is involved in a publication, use the History Archive function (to prevent changes for your linked data) but still put a copy of that data somewhere you fully control and that has a “data backup plan” type of guarantee attached.

With that context, for your question here:

You can run jobs from either space. Any datasets (files) that are over 30 days old will be deleted by us if they are assigned to the 30-day storage location. You can delete data from either storage location to free up space for new data (uploaded or created data). You can also download data from either space.

I’m guessing that you might be thinking of the 30-day space as a temporary place to run jobs, when instead it is an ongoing space for you to use, but all of the files assigned to that space will be 30 days or less in age. Once a file is 31 “days old” we will purge it unless you use the toggle to move it into your permanent storage space.

Am I answering what you need to know? Feel free to ask more or let us know if this enough! :slight_smile:

Dear Jenna,
Thank you very much for taking time to explain it. I need the space only to run the operations. Hopefully I will finish it in less than 30 days, and I won’t exceed 1T. I will have the history saved somewhere else once I complete the analysis. Thank you again
Best regards
Chandima

1 Like