I’m a relatively savvy Galaxy user, however just today, we are experiencing issues in downloading any file from Galaxy Main larger than 1Gb. Irrespective of file size, the resulting downloaded archive is 1.08Gb, and fails to expand on opening.
I am downloading using a good internet connection (University-based) and am doing nothing different than usual!
So, I’ve checked locally and there is no traffic/network shaping in place. I’ve also repeated the below on two different computers;
(1) Tried to download collection >1Gb using API key as previous using both curl and wget - all fail at 1.08Gb. Attempting to download a single file from within a collection, without API fails due to authentification.
(2) Tried to download single dataset using direct wget/curl without API - wget fails, but curl works
(3) Tried three different web browsers to download either a collection or a single file >1Gb - all fail at 1.08Gb
Sorry that you are having problems and thanks for doing all of the troubleshooting.
Which Galaxy server are you working at? You state Galaxy Main but we’d like to confirm that with the URL.
Also, to help expedite the troubleshooting, share back some of the command strings that you used and note which failed/worked. It is Ok to xxx mask out the actual dataset http address and API keys, but leave all other content, including punctuation, intact. Use “Pre-formatted” text so that any spaces, etc are preserved.
We can follow up from there, either here publically or in a direct message based on your reply.
NB: I’ve redacted the data ID and my api key using XXX - I can confirm the correct details were provided in the command.
Running this command twice, results in two different file sizes as below, none of which are anywhere near the expected size - I have 20 files in the collection, many of which are several Gb each;
MyTestArchive.tgz [ <=> ] 175.44M 6.60MB/s in 43s
MyTestArchive.tgz [ <=> ] 208.08M 3.18MB/s in 48s
Command to download a single file from within a collection [WORKS];
Conclusions; individual files from within a collection appear to be accessible and download correctly using wget or curl. However, when attempting to download the entire collection (by capturing the collection url from the disk icon), the download stops at an apparently random point. Yesterday I found this to be ~ 1.08Gb, whereas today, this appears to be ~ 200Mb.
Daniel, we can help in the morning here. Apologies for the delay. This should be working, but we can sort out why it isn’t.
If you want to send me a direct message with a share link to the history with the collection you are trying to download (be sure to share the “objects”) that will help to jump-start the troubleshooting. Or, if your registered email here is the same as at https://usegalaxy.org, just message the history name and dataset number (direct message is fine for that as well). I’m an administrator at both places but need to know where to look. I’m pretty sure that I know your account email from prior help but that is still good to confirm and can be done privately.
Reminder: Never share your account password with anyone. An admin wouldn’t need it.
@KEELE Yes, the data is large. The connection dropped for me too.
I remembered that this has come up before. See this prior Q&A on how to “resume” downloads with curl and wget and see if one of those works for you.
Your other option is to create a History archive and download that. Once uncompressed, you’ll find your data inside of it. Tip: Copy just the data you need into a new history to make it smaller/faster to process this way. Copies of dataset you already have in your account do not consume any additional account quota space. I’m guessing that you don’t need everything in the original history, just the results, but either way should work. It just takes longer to create then download or import-to-another-Galaxy-server a really large history archive.
The option to create and download/generate a link to a History archive is under the History menu (gear icon).
Be sure to create a share link to the history (and objects) – just a link is fine, you don’t need to publish it) – before creating the archive if you decide to go this route. Sharing after (while the archive is being created) doesn’t work as well.
Hi, I am retrieving this older conversation because I am suffering from exactly the same issues when trying to download a collection from the usegalaxy.eu server, it fails exactly at 1.08Gb. I have tried
wget -O pool5.tgz https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and
curl -o pool5 https://usegalaxy.eu/api/dataset_collections/XXXX/download?key=XXXX
and each time I try to read the names of the content with tar -zvft , it starts reading the file names until it gets to the halfway and then I get the following error
tar: Truncated input file (needed 101140480 bytes, only 0 available)
tar: Error exit delayed from previous errors.
I have tried in a Mac and a Windows laptop. I’d appreciate some help with this.
Many thanks,
Ramiro
Downloading larger datasets in a history archive (potentially subsetted to only include important data) is the current workaround. More details in this post: Files will not download completely
I believe this was due to the nginx buffer temp file max size being 1GB by default. We probably can’t support buffer temp files large enough for collections, but I have disabled buffering for collection downloads on usegalaxy.org entirely, which may fix this issue (hopefully without creating more issues). Please give it a try and let us know if it’s still not working.