Unable to upload a dataset via the API using a URL requiring authentication

Hi,

I am working on a data warehouse development and want to provide access to bioinformatics analysis through a Galaxy instance. I am working from a local Galaxy instance.

I am stuck at the step of uploading a dataset. I want to use the put_url function to upload a dataset using a URL from the data warehouse. However, to access this dataset, I need to authenticate to the data warehouse with a token.

I provided the token in the galaxyInstance headers. The following code works fine, but it means downloading the dataset content locally and returning it to Galaxy.

from bioblend import galaxy
gi = galaxy.GalaxyInstance(url = <local-url>, key=API_KEY

gi.json_headers["Authorization"] = "Bearer " + data_warehouse_apiKey
gi.json_headers["Accept"] = "*/*"

response = gi.make_get_request(<dataset-url>)
gi.tools.put_url(response.content.decode(), <hist_id>, file_name = "test_dataset")

I have a “HTTP Error 401” if I run:

from bioblend import galaxy
gi = galaxy.GalaxyInstance(url = <local-url>, key=API_KEY

gi.json_headers["Authorization"] = "Bearer " + data_warehouse_apiKey
gi.json_headers["Accept"] = "*/*"

gi.tools.put_url(<dataset-url>, <hist_id>, file_name = "test_dataset")

I tried with a dataset available in Zenodo, which do not need authentication, and the put_url function works fine.

How can I upload a dataset to a Galaxy history using a URL requiring authentication?

Thanks in advance,

Agnès Barnabé

Hi @agnesbrnb

Any dataset routed through the application web interface’s Upload tool by URL needs to have a “public” link for the URL and to directly point to a file (no intermediate handshakes). That is how the “galaxy” user accesses the data file, assigns your account permissions, and keeps it private once in Galaxy.

There are a two common ways around this, with the first probably what you are looking for.

  1. Admin method: Load files into a Data Library as an administrator using one of the other methods explained here. → Hands-on: Data Libraries / Data Libraries / Galaxy Server administration

    Once the data is in a library, you can set the sharing permissions for groups of users or for all users, then they can import the datasets into a history to work with them.

    Users find and select this data under the Libraries link in the left Activity bar (most public servers use this if you want to see what it looks like at any). We use this in tutorials when training and you could put up a custom guide on your server, too, for any special instructions. It works about the same as the other “copy datasets” functions.

  2. User method: Configure one of the User preferences → Manage Your Remote File Sources.

    This is where the credentials are saved, and you can navigate the resources (per account) under Upload → Choose remote files.

Both of these avoid exposing credentials through unsecure URLs.

I hope this helps, and we can follow up more if you have questions! :slight_smile: