storage space overload due to uncompressed files

Dear all, I have a question regarding storage space and uncompressed files during jobs running. I used the SortMeRNA tool, to get rid of ribosomal RNA. Unfortunately it requires uncompressed reads. Since I wanted to do all all at once and the storage space is limited, just two jobs were running and the others paused. After one of them finished, the others still did not continue and I deleted and purged to start them one by one. Since there are still the uncompressed files of this purged jobs, my storage space is still full. Since the decompressing was performed by the SortmeRNA tool itself the uncompressed files have the same number as the compressed files (I think it should be temporary). Can I delete them or do I delete the compressed file with that move as well? Or is it possible to have a higher storage space for the time this Sortmetool is running? So that I can purge the aligned reads and be below the 250 GB again after the tool is finished?

HI @f.gather

Yes, some tools require uncompressed datasets, and these are added to the hidden tab. What you can do when running out of space:

More practical help about this:

If you do not need the uncompressed version for downstream tools, you can purge it after the tool that uncompressed has completed. You can do that directly, in small batch, or using a workflow. Quota grants are also possible.

  1. If working directly in the history:
  • click into the hidden tab
  • unhide the dataset
  • delete and purge the dataset
  • if you don’t need any of your hidden files, you can purge all of them using the gear icon in a batch. To make this work, consider using specific histories for specific tools that you know will explode the quota! That will let your get rid of the excess data created in batch. You could also create a mini-workflow for just a few steps to make this super quick. This is probably how most labs are doing batch analysis at the public sites – they are using the service as a serious computational resource by only loading what they want to process, then purging it, then doing the next batch.
  1. If using a workflow (definitely recommended!)
  • there are options to purge some or all intermediate files not needed by downstream tools as part of the workflow processing. That means no tedious by-hand steps.
  • go into the workflow editor to find this.
  • you can also review our workflow tutorials for the exact how-to, along with reviewing example template tutorials.
  1. For large groups of data, sometimes these early steps on reads can be processed in batches since the files are NOT used together yet. This is another way people use the public site as a workflow/computational engine
  • load up some of the data, and do all the early steps to create the downstream summaries
  • purge the raw data, and sometimes all the reads entirely
  • do the same for the next batch
  • then work with the smaller summary files
  1. If none of this is enough, or you have a lot of data and a prepared workflow, or are maybe working to tune up a workflow, and you are an academic researcher, contact the administrators of the Galaxy server where you are working and request a temporary quota grant. These tend to be short time windows since a workflow chomps through massive batches of data very quickly.
  • the server has a link to the request form on the homepage in the section “Our Data Policy”, and you have space to explain what your goals are.

Hope this helps!

1 Like