Trouble downloading and unzipping data

I have been trying to download different datasets from my history, but am running into errors when trying to unzip them.

First, I can download a zipped collection of my html report outputs from fastp but when I try to unzip them with Windows, I get an unspecified error. I have tried unzipping them with WinRar but it says the files are corrupted. This also happens when trying to download html reports from FastQC. Note this also happens when I try to download the html file for a single sample, as it still zips the report.

I am using Chrome, but have also tried this on two different computers with the same result.

Any clues on what is going on here? This problem only popped up in about the past week. I had no trouble unzipping my files before.
Thank you!

1 Like

Hi @Danielle_Ireland

I think we got back to you by email with this exact help, but I’ll just post the same for others reading.

HTML output doesn’t download from the disc icon using any browser that I know of. The browser will not understand what to do, create a gzip “directory”, then attempt to download but it never finishes. Why? The data isn’t really a compressed directory – it is just a single file. HTML data is handled in a special way in Galaxy for security reasons. You cannot upload HTML data at all, and downloading requires an alternate method involving the command line.

FAQ that applies to any composite dataset, or security protected dataset in Galaxy:

On a MAC, you could use the terminal program already built into the OS to access a unix compatible command line tool (wget, curl). On a PC, you could use a utility like putty https://www.putty.org/ or whatever you normally use for linux/unix access.


Example for HTML data specifically:

curl -o out.html --insecure 'link'

Where the original link copied from the dataset is something like this:

https://usegalaxy.org/api/datasets/f9cad7b01a472135987ad5307248c882/display?to_ext=html

But the external display type is stripped off the end, and the URL used for the link is this:

https://usegalaxy.org/api/datasets/f9cad7b01a472135987ad5307248c882/display?

A complete command string would be like this one. It happens to be an actual (small, test data) FastQC webpage dataset if anyone wants to test it. I’ll leave it undeleted for now.

curl -o out.html --insecure 'https://usegalaxy.org/api/datasets/f9cad7b01a472135987ad5307248c882/display?'

Note: If you happen to be working at a GDPR compliant server, like UseGalaxy.eu, you’ll need to also set the history the datasets are in to a “shared-by-link” state and include your API key. Full details are in the FAQ above.

Hope that helps!

Hi @jennaj
Thank you for responding here as I don’t believe I ever received an email about this particular issue.
My problem is not just with HTML datasets but that any collection that I try to download (that should be a reasonable size) cannot be extracted by Windows. Note that before last week, I was able to download the entire collection at once for things like Fastp html reports but now cannot. But this is also a problem with my FastQC Raw Data collection which are txt files. I have a collection of 4 paired-end samples. I can download each txt file individually so at least I can get the data, but in the past I have definitely been able to download the entire collection at once. But for some reason the resulting zipped files will not extract anymore.

It seems like using the command line approach you provided may be the most robust way to get my data going forward.

Thanks,
Danielle

1 Like

Hi @Danielle_Ireland

Ok – then that was another user who reported the same problem in a very similar way to the UseGalaxy.org mailing list.

This does turn out to be an actual bug from what I can tell after reviewing this more. I’ve opened an issue ticket here for expanded review by our developers. Updates will post in that ticket. Bug: FastQC webpage output fails complete download from disc icon · Issue #14365 · galaxyproject/galaxy · GitHub

I’m going to mark this reply as the “solution” to make it easier for others to find the Q&A, but also add the tag “server open issue” for accuracy as this functionality is still pending a bug remedy.


The workaround until the fix is made (tool or Galaxy itself) is the command-line method.