Unable to upload a dataset via the API using a URL requiring authentication

Hi,

I am working on a data warehouse development and want to provide access to bioinformatics analysis through a Galaxy instance. I am working from a local Galaxy instance.

I am stuck at the step of uploading a dataset. I want to use the put_url function to upload a dataset using a URL from the data warehouse. However, to access this dataset, I need to authenticate to the data warehouse with a token.

I provided the token in the galaxyInstance headers. The following code works fine, but it means downloading the dataset content locally and returning it to Galaxy.

from bioblend import galaxy
gi = galaxy.GalaxyInstance(url = <local-url>, key=API_KEY

gi.json_headers["Authorization"] = "Bearer " + data_warehouse_apiKey
gi.json_headers["Accept"] = "*/*"

response = gi.make_get_request(<dataset-url>)
gi.tools.put_url(response.content.decode(), <hist_id>, file_name = "test_dataset")

I have a “HTTP Error 401” if I run:

from bioblend import galaxy
gi = galaxy.GalaxyInstance(url = <local-url>, key=API_KEY

gi.json_headers["Authorization"] = "Bearer " + data_warehouse_apiKey
gi.json_headers["Accept"] = "*/*"

gi.tools.put_url(<dataset-url>, <hist_id>, file_name = "test_dataset")

I tried with a dataset available in Zenodo, which do not need authentication, and the put_url function works fine.

How can I upload a dataset to a Galaxy history using a URL requiring authentication?

Thanks in advance,

Agnès Barnabé

Hi @agnesbrnb

Any dataset routed through the application web interface’s Upload tool by URL needs to have a “public” link for the URL and to directly point to a file (no intermediate handshakes). That is how the “galaxy” user accesses the data file, assigns your account permissions, and keeps it private once in Galaxy.

There are a two common ways around this, with the first probably what you are looking for.

  1. Admin method: Load files into a Data Library as an administrator using one of the other methods explained here. → Hands-on: Data Libraries / Data Libraries / Galaxy Server administration

    Once the data is in a library, you can set the sharing permissions for groups of users or for all users, then they can import the datasets into a history to work with them.

    Users find and select this data under the Libraries link in the left Activity bar (most public servers use this if you want to see what it looks like at any). We use this in tutorials when training and you could put up a custom guide on your server, too, for any special instructions. It works about the same as the other “copy datasets” functions.

  2. User method: Configure one of the User preferences → Manage Your Remote File Sources.

    This is where the credentials are saved, and you can navigate the resources (per account) under Upload → Choose remote files.

Both of these avoid exposing credentials through unsecure URLs.

I hope this helps, and we can follow up more if you have questions! :slight_smile:

Hi,

Thanks for your answer.
I couldn’t find the option Manage Your Remote File Sources in the User preferences. I only have those options:

I found a configuration file named files_sources_conf.yml, is this what you mention? It seems to allow to define remote file sources at the global level and not for a specific user.

1 Like

Hi @agnesbrnb

Yes, this needs to be configured. This changed with the 25.0 release, but those details are more about the underlying handling and expanding the number of plugins, and the updated display and find functions, more than the configuration files themselves.

We have a tutorial (pending an update!) that explains DropBox. Maybe it is helpful? The same basic steps will apply to any, only the details for the plug in connection will differ by the remote file source and what it is expecting for a handshake.

We have documentation here about how all of these files are connected. These are automatically updated per release since the content reflects the actual files in the release.


A short path through the nesting is this:

  1. The top level toggles to activate this in the UI for users are here.

See

  1. Then the details are configured in here, along with the per-plugin files these blocks are referencing. Scroll down in the file to see the options and individual files per plugin.

The user level block is right below here.

  1. Then the per-plugin configurations are in here.

In short, you will want to set the top level to activate this functionality at all, then choose which plugins to activate, then provide the details for those plugins. Only those you configure will appear in the UI for users.

Please give this a try, and ask questions if you need more help! Most of the team has been working on the release, and were away at our yearly conference, but we are all back now and can be more responsive. :slight_smile: