Larger files not downloaded completely? Try using a History Archive

Ramiro · August 13, 2020, 10:26am

Hi, I have run out of space in the usegalaxy.eu server and am trying to download files to delete them from my history to make space on my account. However, I am experiencing an issue when trying to download files from the usegalaxy.eu history, it fails exactly at 1.08Gb. I have tried
wget -O pool5.tgz https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and
curl -o pool5 https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and each time I try to check the files of the content with tar -zvft , it starts reading the file names until it gets to the halfway and then I get the following error
tar: Truncated input file (needed 101140480 bytes, only 0 available)
tar: Error exit delayed from previous errors.
I have tried in a Mac and a Windows laptop, and different files and with one month of difference. I’d appreciate some help with this.
Many thanks,
Ramiro

jennaj · August 13, 2020, 11:19pm

Hello @Ramiro

This command works on Mac/Linux: tar -xvzf <file.tgz>. You may need to quote the file name if it contains spaces.

The issue could involve the dataset permissions in Galaxy or possibly your internet connection. Updating curl might also help. Resuming a partial/aborted transfer is also possible but that doesn’t seem to be your issue.

FAQ: https://galaxyproject.org/support/download-data/

Prior Q&A: Downloading a collection from usegalaxy.org fails to complete

Hope that helps!

Ramiro · August 14, 2020, 8:05am

Hello @jennaj,
Thanks for your comments. I have updated curl but still the download does not work. I have also tried restarting the partial downloads with wget and curl following the commands on the FAQ page but it does not work either. I am working from home (due to covid-19) and have tried through wireless and connecting to the router. I have also tried from my work server (logging through a vpn) and the files do not download either.
I have been looking at some other comments on this topic and this is my case too and it wasn’t solved Downloading Large Files - 1Gb Limit (Galaxy Main | wget | curl | API Key).
Any other suggestion? Many thanks,
Ramiro

jennaj · August 14, 2020, 4:27pm

Thanks for the update.

The transfer stopping around 1 GB was a known issue with the curl utility (unrelated to Galaxy) but it is difficult to know if that is what the problem is here.

One potential workaround is to copy just the datasets you want to download into a new history, set that history to be in a shared state (by link, under User > Histories > “Share or Publish” per history pull-down menu). Then under the History menu (gear icon), Export that subset History to an archive. These take time to compress – click on the link to refresh the status. Once ready, the message will update and the History archive can be downloaded. Your datasets will be inside.

Let’s also get some input from the developers/admins and see if they have other ideas about what might be going wrong with collection downloads. I started a Gitter chat here: https://gitter.im/galaxyproject/Lobby?at=5f36bb4060892e0c69702c32

Feel free to comment at Gitter, too.

jennaj · August 14, 2020, 5:22pm

Update: More feedback in the Gitter thread.

Try the history archive workaround. The data is pre-compressed with that method and avoids “temporary data caching” limitations.

Copies of your own dataset do not consume any additional quota space, so you can reorganize anyway you want. But once downloaded, you’ll need to purge all copies to free up space. Be sure to check the downloaded data first (before purging!) so you don’t lose anything important.

We’ll probably be looking into better solutions but this should get you moving forward for now.

Ramiro · August 14, 2020, 5:56pm

Great, thank you for the help. Will try the work arounds suggested and let you know here.
Thank you!

Ramiro · August 17, 2020, 1:02pm

Dear @jennaj,

An update on the subject. The workaround you propose of making a history archive and downloading it has worked. And there does not appear to be any error on the downloaded data.

Thanks for your help.

jennaj · August 17, 2020, 4:21pm

Glad that worked & appreciate the followup! The compression happens a bit differently using that method – archive created before the data starts downloading versus a type “streaming” download. Easier manage from both sides (server + you), especially when using a home or slower internet connection and transferring large data.

jennaj · August 17, 2020, 4:23pm

Updated solution: Data download truncated? Try downloading large data in a history archive instead of directly from a dataset/dataset-collection

Topic		Replies	Views
Downloading Large Files - 1Gb Limit (Galaxy Main \| wget \| curl \| API Key) usegalaxy.org support download	14	6933	August 17, 2020
Problems in downloading collections using both browser and curl usegalaxy.org support collections	3	519	March 2, 2023
Downloading histories by curl or wget and link only downloads an HTML file usegalaxy.org support history , download	4	1749	March 13, 2019
Export data as compressed file usegalaxy.org support history	5	1411	June 24, 2019
cannot download files from history usegalaxy.org support download	1	25	January 19, 2025

Larger files not downloaded completely? Try using a History Archive

Related topics