Downloading Large Files - 1Gb Limit (Galaxy Main | wget | curl | API Key)

KEELE · August 12, 2019, 12:40pm

I’m a relatively savvy Galaxy user, however just today, we are experiencing issues in downloading any file from Galaxy Main larger than 1Gb. Irrespective of file size, the resulting downloaded archive is 1.08Gb, and fails to expand on opening.

I am downloading using a good internet connection (University-based) and am doing nothing different than usual!

Please help!

Best wishes,

D

KEELE · August 13, 2019, 7:15am

So, I’ve checked locally and there is no traffic/network shaping in place. I’ve also repeated the below on two different computers;

(1) Tried to download collection >1Gb using API key as previous using both curl and wget - all fail at 1.08Gb. Attempting to download a single file from within a collection, without API fails due to authentification.

(2) Tried to download single dataset using direct wget/curl without API - wget fails, but curl works

(3) Tried three different web browsers to download either a collection or a single file >1Gb - all fail at 1.08Gb

Using MacOS Mojave

jennaj · August 14, 2019, 12:57am

Hi @KEELE

Sorry that you are having problems and thanks for doing all of the troubleshooting.

Which Galaxy server are you working at? You state Galaxy Main but we’d like to confirm that with the URL.

Also, to help expedite the troubleshooting, share back some of the command strings that you used and note which failed/worked. It is Ok to xxx mask out the actual dataset http address and API keys, but leave all other content, including punctuation, intact. Use “Pre-formatted” text so that any spaces, etc are preserved.

We can follow up from there, either here publically or in a direct message based on your reply.

KEELE · August 14, 2019, 7:16am

Many thanks for your response I have done some further investigation;

(1) I’m using usegalaxy.org

Command to download a collection [FAILS];

wget -O MyTestArchive.tgz 'https://usegalaxy.org/api/dataset_collections/bcaddaXXXe5aafbb/download’?key=90a1876062110873XXXXX38a426cca8a

NB: I’ve redacted the data ID and my api key using XXX - I can confirm the correct details were provided in the command.

Running this command twice, results in two different file sizes as below, none of which are anywhere near the expected size - I have 20 files in the collection, many of which are several Gb each;

MyTestArchive.tgz [ <=> ] 175.44M 6.60MB/s in 43s

MyTestArchive.tgz [ <=> ] 208.08M 3.18MB/s in 48s

Command to download a single file from within a collection [WORKS];

wget -O SingleFileWithinCollection.tgz ‘https://usegalaxy.org/datasets/bbdXXXXXcb8906b5fc7e7eced1d68b4e/display?to_ext=data&hdca_id=bcadda227e5aafbb&element_identifier=ID1-DZ_A_TTACCGAC-CGTATTCG_L008.fastq.gz’?key=90a1876062110873XXXXX38a426cca8a

Conclusions; individual files from within a collection appear to be accessible and download correctly using wget or curl. However, when attempting to download the entire collection (by capturing the collection url from the disk icon), the download stops at an apparently random point. Yesterday I found this to be ~ 1.08Gb, whereas today, this appears to be ~ 200Mb.

Your help is most appreciated.

KEELE · August 14, 2019, 7:21am

Deleted for clarity; only partial first response published.

For everyone’s benefit, adding a hard line to an email cuts off the response!

KEELE · August 14, 2019, 7:31am

Deleted for clarity; only partial first response published.

KEELE · August 16, 2019, 10:39am

Can anyone help with the download of collections please?

jennaj · August 19, 2019, 11:29am

@KEELE

Daniel, we can help in the morning here. Apologies for the delay. This should be working, but we can sort out why it isn’t.

If you want to send me a direct message with a share link to the history with the collection you are trying to download (be sure to share the “objects”) that will help to jump-start the troubleshooting. Or, if your registered email here is the same as at https://usegalaxy.org, just message the history name and dataset number (direct message is fine for that as well). I’m an administrator at both places but need to know where to look. I’m pretty sure that I know your account email from prior help but that is still good to confirm and can be done privately.

Reminder: Never share your account password with anyone. An admin wouldn’t need it.

KEELE · August 19, 2019, 11:45am

Dear JennaJ,

Many thanks indeed.

I have done so.

Best wishes,

D

jennaj · August 20, 2019, 4:06pm

@KEELE Yes, the data is large. The connection dropped for me too.

I remembered that this has come up before. See this prior Q&A on how to “resume” downloads with curl and wget and see if one of those works for you.

Your other option is to create a History archive and download that. Once uncompressed, you’ll find your data inside of it. Tip: Copy just the data you need into a new history to make it smaller/faster to process this way. Copies of dataset you already have in your account do not consume any additional account quota space. I’m guessing that you don’t need everything in the original history, just the results, but either way should work. It just takes longer to create then download or import-to-another-Galaxy-server a really large history archive.

The option to create and download/generate a link to a History archive is under the History menu (gear icon).
The option to import a History archive (already downloaded archive file or URL from a publically accessible Galaxy server) is at the top of the Saved History page, if you have a local Galaxy and want to store data in context.

Be sure to create a share link to the history (and objects) – just a link is fine, you don’t need to publish it) – before creating the archive if you decide to go this route. Sharing after (while the archive is being created) doesn’t work as well.

Ramiro · July 16, 2020, 1:18pm

Hi, I am retrieving this older conversation because I am suffering from exactly the same issues when trying to download a collection from the usegalaxy.eu server, it fails exactly at 1.08Gb. I have tried
wget -O pool5.tgz https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and
curl -o pool5 https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and each time I try to read the names of the content with tar -zvft , it starts reading the file names until it gets to the halfway and then I get the following error
tar: Truncated input file (needed 101140480 bytes, only 0 available)
tar: Error exit delayed from previous errors.
I have tried in a Mac and a Windows laptop. I’d appreciate some help with this.
Many thanks,
Ramiro

KEELE · July 16, 2020, 7:26pm

Dear Ramiro,

I never did solve this problem I’m afraid, so I am of little help here, with regret.

Ramiro · July 17, 2020, 7:27am

Thanks anyway, I hope someone has an idea of how to solve this.

jennaj · August 14, 2020, 6:26pm

Downloading larger datasets in a history archive (potentially subsetted to only include important data) is the current workaround. More details in this post: Files will not download completely

nate · August 17, 2020, 8:48pm

I believe this was due to the nginx buffer temp file max size being 1GB by default. We probably can’t support buffer temp files large enough for collections, but I have disabled buffering for collection downloads on usegalaxy.org entirely, which may fix this issue (hopefully without creating more issues). Please give it a try and let us know if it’s still not working.

Topic		Replies	Views
Larger files not downloaded completely? Try using a History Archive usegalaxy.eu support download , history-archive	8	2441	August 17, 2020
Problems in downloading collections using both browser and curl usegalaxy.org support collections	3	523	March 2, 2023
Downloading a collection from usegalaxy.org fails to complete usegalaxy.org support download	4	1549	November 16, 2020
Download issue for file larger than 1GB usegalaxy.org support server-admin , galaxy-local	3	1240	March 17, 2021
Downloading Collection via Web or curl/wget errors: "IOError: headers already sent" usegalaxy.org support galaxy-local	2	618	May 14, 2020

Downloading Large Files - 1Gb Limit (Galaxy Main | wget | curl | API Key)

Related topics