Downloading Large Files - 1Gb Limit (Galaxy Main | wget | curl | API Key)

I’m a relatively savvy Galaxy user, however just today, we are experiencing issues in downloading any file from Galaxy Main larger than 1Gb. Irrespective of file size, the resulting downloaded archive is 1.08Gb, and fails to expand on opening.

I am downloading using a good internet connection (University-based) and am doing nothing different than usual!

Please help!

Best wishes,

D

1 Like

So, I’ve checked locally and there is no traffic/network shaping in place. I’ve also repeated the below on two different computers;

(1) Tried to download collection >1Gb using API key as previous using both curl and wget - all fail at 1.08Gb. Attempting to download a single file from within a collection, without API fails due to authentification.

(2) Tried to download single dataset using direct wget/curl without API - wget fails, but curl works

(3) Tried three different web browsers to download either a collection or a single file >1Gb - all fail at 1.08Gb

Using MacOS Mojave

1 Like

Hi @KEELE

Sorry that you are having problems and thanks for doing all of the troubleshooting.

Which Galaxy server are you working at? You state Galaxy Main but we’d like to confirm that with the URL.

Also, to help expedite the troubleshooting, share back some of the command strings that you used and note which failed/worked. It is Ok to xxx mask out the actual dataset http address and API keys, but leave all other content, including punctuation, intact. Use “Pre-formatted” text so that any spaces, etc are preserved.

We can follow up from there, either here publically or in a direct message based on your reply.

Many thanks for your response :slight_smile: I have done some further investigation;

(1) I’m using usegalaxy.org

Command to download a collection [FAILS];

wget -O MyTestArchive.tgz 'https://usegalaxy.org/api/dataset_collections/bcaddaXXXe5aafbb/download’?key=90a1876062110873XXXXX38a426cca8a

NB: I’ve redacted the data ID and my api key using XXX - I can confirm the correct details were provided in the command.

Running this command twice, results in two different file sizes as below, none of which are anywhere near the expected size - I have 20 files in the collection, many of which are several Gb each;

MyTestArchive.tgz [ <=> ] 175.44M 6.60MB/s in 43s

MyTestArchive.tgz [ <=> ] 208.08M 3.18MB/s in 48s

Command to download a single file from within a collection [WORKS];

wget -O SingleFileWithinCollection.tgz ‘https://usegalaxy.org/datasets/bbdXXXXXcb8906b5fc7e7eced1d68b4e/display?to_ext=data&hdca_id=bcadda227e5aafbb&element_identifier=ID1-DZ_A_TTACCGAC-CGTATTCG_L008.fastq.gz’?key=90a1876062110873XXXXX38a426cca8a

Conclusions; individual files from within a collection appear to be accessible and download correctly using wget or curl. However, when attempting to download the entire collection (by capturing the collection url from the disk icon), the download stops at an apparently random point. Yesterday I found this to be ~ 1.08Gb, whereas today, this appears to be ~ 200Mb.

Your help is most appreciated.

Deleted for clarity; only partial first response published.

For everyone’s benefit, adding a hard line to an email cuts off the response! :slight_smile:

Deleted for clarity; only partial first response published.

Can anyone help with the download of collections please?

1 Like

@KEELE

Daniel, we can help in the morning here. Apologies for the delay. This should be working, but we can sort out why it isn’t.

If you want to send me a direct message with a share link to the history with the collection you are trying to download (be sure to share the “objects”) that will help to jump-start the troubleshooting. Or, if your registered email here is the same as at https://usegalaxy.org, just message the history name and dataset number (direct message is fine for that as well). I’m an administrator at both places but need to know where to look. I’m pretty sure that I know your account email from prior help but that is still good to confirm and can be done privately.

Reminder: Never share your account password with anyone. An admin wouldn’t need it.

Dear JennaJ,

Many thanks indeed.

I have done so.

Best wishes,

D

1 Like

@KEELE Yes, the data is large. The connection dropped for me too.

I remembered that this has come up before. See this prior Q&A on how to “resume” downloads with curl and wget and see if one of those works for you.

Your other option is to create a History archive and download that. Once uncompressed, you’ll find your data inside of it. Tip: Copy just the data you need into a new history to make it smaller/faster to process this way. Copies of dataset you already have in your account do not consume any additional account quota space. I’m guessing that you don’t need everything in the original history, just the results, but either way should work. It just takes longer to create then download or import-to-another-Galaxy-server a really large history archive.

  • The option to create and download/generate a link to a History archive is under the History menu (gear icon).
  • The option to import a History archive (already downloaded archive file or URL from a publically accessible Galaxy server) is at the top of the Saved History page, if you have a local Galaxy and want to store data in context.

Be sure to create a share link to the history (and objects) – just a link is fine, you don’t need to publish it) – before creating the archive if you decide to go this route. Sharing after (while the archive is being created) doesn’t work as well.

Hi, I am retrieving this older conversation because I am suffering from exactly the same issues when trying to download a collection from the usegalaxy.eu server, it fails exactly at 1.08Gb. I have tried
wget -O pool5.tgz https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and
curl -o pool5 https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and each time I try to read the names of the content with tar -zvft , it starts reading the file names until it gets to the halfway and then I get the following error
tar: Truncated input file (needed 101140480 bytes, only 0 available)
tar: Error exit delayed from previous errors.
I have tried in a Mac and a Windows laptop. I’d appreciate some help with this.
Many thanks,
Ramiro

Dear Ramiro,

I never did solve this problem I’m afraid, so I am of little help here, with regret.

Thanks anyway, I hope someone has an idea of how to solve this.

Downloading larger datasets in a history archive (potentially subsetted to only include important data) is the current workaround. More details in this post: Files will not download completely

I believe this was due to the nginx buffer temp file max size being 1GB by default. We probably can’t support buffer temp files large enough for collections, but I have disabled buffering for collection downloads on usegalaxy.org entirely, which may fix this issue (hopefully without creating more issues). Please give it a try and let us know if it’s still not working.

1 Like