Larger files not downloaded completely? Try using a History Archive

Hi, I have run out of space in the usegalaxy.eu server and am trying to download files to delete them from my history to make space on my account. However, I am experiencing an issue when trying to download files from the usegalaxy.eu history, it fails exactly at 1.08Gb. I have tried
wget -O pool5.tgz https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and
curl -o pool5 https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and each time I try to check the files of the content with tar -zvft , it starts reading the file names until it gets to the halfway and then I get the following error
tar: Truncated input file (needed 101140480 bytes, only 0 available)
tar: Error exit delayed from previous errors.
I have tried in a Mac and a Windows laptop, and different files and with one month of difference. I’d appreciate some help with this.
Many thanks,
Ramiro

1 Like

Hello @Ramiro

This command works on Mac/Linux: tar -xvzf <file.tgz>. You may need to quote the file name if it contains spaces.

The issue could involve the dataset permissions in Galaxy or possibly your internet connection. Updating curl might also help. Resuming a partial/aborted transfer is also possible but that doesn’t seem to be your issue.

FAQ: https://galaxyproject.org/support/download-data/

Prior Q&A: Downloading a collection from usegalaxy.org fails to complete

Hope that helps!

Hello @jennaj,
Thanks for your comments. I have updated curl but still the download does not work. I have also tried restarting the partial downloads with wget and curl following the commands on the FAQ page but it does not work either. I am working from home (due to covid-19) and have tried through wireless and connecting to the router. I have also tried from my work server (logging through a vpn) and the files do not download either.
I have been looking at some other comments on this topic and this is my case too and it wasn’t solved Downloading Large Files - 1Gb Limit (Galaxy Main | wget | curl | API Key).
Any other suggestion? Many thanks,
Ramiro

1 Like

Thanks for the update.

The transfer stopping around 1 GB was a known issue with the curl utility (unrelated to Galaxy) but it is difficult to know if that is what the problem is here.

One potential workaround is to copy just the datasets you want to download into a new history, set that history to be in a shared state (by link, under User > Histories > “Share or Publish” per history pull-down menu). Then under the History menu (gear icon), Export that subset History to an archive. These take time to compress – click on the link to refresh the status. Once ready, the message will update and the History archive can be downloaded. Your datasets will be inside.

Let’s also get some input from the developers/admins and see if they have other ideas about what might be going wrong with collection downloads. I started a Gitter chat here: https://gitter.im/galaxyproject/Lobby?at=5f36bb4060892e0c69702c32

Feel free to comment at Gitter, too.

Update: More feedback in the Gitter thread.

Try the history archive workaround. The data is pre-compressed with that method and avoids “temporary data caching” limitations.

Copies of your own dataset do not consume any additional quota space, so you can reorganize anyway you want. But once downloaded, you’ll need to purge all copies to free up space. Be sure to check the downloaded data first (before purging!) so you don’t lose anything important.

We’ll probably be looking into better solutions but this should get you moving forward for now.

Great, thank you for the help. Will try the work arounds suggested and let you know here.
Thank you!

1 Like

Dear @jennaj,

An update on the subject. The workaround you propose of making a history archive and downloading it has worked. And there does not appear to be any error on the downloaded data.

Thanks for your help.

1 Like

Glad that worked & appreciate the followup! The compression happens a bit differently using that method – archive created before the data starts downloading versus a type “streaming” download. Easier manage from both sides (server + you), especially when using a home or slower internet connection and transferring large data.

Updated solution: Data download truncated? Try downloading large data in a history archive instead of directly from a dataset/dataset-collection