How to invoke workflow using Galaxy's REST API

Hello,

I’m a developer at IU working on AMP project (https://github.com/AudiovisualMetadataPlatform). We are using Galaxy as our backend workflow engine. Right now we need to use Galaxy’s REST API to invoke a workflow, but I’m running into some issues. I tried 3 different ways (sending POST request to Galaxy server, running python command on localhost, and using blend4j WorkflowClient, and all of them end up with the same error that the input file is blank whether I use a dataset in history or a file uploaded into data library. Here are the details of what I tried (for security reasons I replaced the real key value witth “ValidUserKey”).

Method 1:
Sending POST request to http://localhost:8300/api/workflows?key=validUserApiKey using Postman or Curl, with the following fields in the request body:
{
“workflow_id”: “0a248a1f62a0cc04”
“parameters”: {}
“ds_map”: {“0”: {“src”: “ldda”, “id”: “2d9035b3fc152403”}}
“no_add_to_history”: “true”
“history”: “hist_id=d413a19dec13d11e”
}

I got a valid response, showing that the specified workflow is invoked, two jobs for the 2 steps in the workflow are created and queued, the outputs are saved into the specified history. However, when I look into the outputs, the first job shows as failed in Error, “tool error, An error occurred with this dataset: Specify a dataset of the required format / build for parameter input.”; while the second job is suspened as its input is the output of the first job.The log shows the following:

galaxy.jobs.runners ERROR 2019-08-28 14:44:40,932 [p:20569,w:1,m:0] [LocalRunner.work_thread-2] (106) Failure preparing job
Traceback (most recent call last):
File “lib/galaxy/jobs/runners/init.py”, line 218, in prepare_job
job_wrapper.prepare()
File “lib/galaxy/jobs/init.py”, line 862, in prepare
tool_evaluator.set_compute_environment(compute_environment, get_special=get_special)
File “lib/galaxy/tools/evaluation.py”, line 78, in set_compute_environment
visit_input_values(self.tool.inputs, incoming, validate_inputs)
File “lib/galaxy/tools/parameters/init.py”, line 167, in visit_input_values
callback_helper(input, input_values, name_prefix, label_prefix, parent_prefix=parent_prefix, context=context)
File “lib/galaxy/tools/parameters/init.py”, line 130, in callback_helper
new_value = callback(**args)
File “lib/galaxy/tools/evaluation.py”, line 76, in validate_inputs
value = input.from_json(value, request_context, context)
File “lib/galaxy/tools/parameters/basic.py”, line 1723, in from_json
raise ValueError(“Specify a dataset of the required format / build for parameter %s.” % self.name)
ValueError: Specify a dataset of the required format / build for parameter input.

Method 2:
I ran the following command in my local Galaxy directory:

yingfeng@yingfeng-desktop:~/Work/Amp/galaxy/scripts/api$ python workflow_execute.py ValidUserKey http://localhost:8300/api/workflows 0a248a1f62a0cc04 hist_id=d413a19dec13d11e 0=hda=0d16186aaff7cbfd
Response

{u’inputs’: {}, u’update_time’: u’2019-08-29T13:29:18.827692’, u’uuid’: u’00c67111-ca61-11e9-8e0d-d89ef308f1d5’, u’outputs’: [u’33c1d4ca9f8bc33c’, u’ab124a85aef33434’], u’history_id’: u’d413a19dec13d11e’, u’workflow_id’: u’417e33144b294c21’, u’output_collections’: {}, u’state’: u’scheduled’, u’steps’: [{u’workflow_step_label’: u’AudioExtraction’, u’update_time’: u’2019-08-29T13:29:18.730966’, u’job_id’: u’9b5c597dcbb59371’, u’state’: u’scheduled’, u’workflow_step_uuid’: u’d175961f-b75c-437a-ab70-9ae9c70f0a7c’, u’order_index’: 0, u’action’: None, u’model_class’: u’WorkflowInvocationStep’, u’workflow_step_id’: u’a7db2fac67043c7e’, u’id’: u’8c49be448cfe29bc’}, {u’workflow_step_label’: u’SilenceRemoval’, u’update_time’: u’2019-08-29T13:29:18.830730’, u’job_id’: u’a64417ff266b740e’, u’state’: u’scheduled’, u’workflow_step_uuid’: u’cc8f82fe-c594-4e20-afbc-f454a3149de3’, u’order_index’: 1, u’action’: None, u’model_class’: u’WorkflowInvocationStep’, u’workflow_step_id’: u’4ff6f47412c3e65e’, u’id’: u’59c76a119581e190’}], u’model_class’: u’WorkflowInvocation’, u’id’: u’63cd3858d057a6d1’, u’history’: u’d413a19dec13d11e’}

Looking into the history, I got same error on input file being unspecfied on the first job.
Method 3:
Using blend4j’s WorkflowsClient.runWorkflow, passing the correct WorkflowInputs as specified by the API. I got same error on input file being unspecfied on the first job.

I’m rather blocked by this issue, as we rely on the Galaxy API to invoke workflows from our AMP app. Could you kindly give me a hint on what I did wrong? I’m sure many others have used these APIs. Could someone provide an example on how to invoke a workflow? Just to make is clear, it shouldn’t be a permission issue, as all the datasets, histories, libraries, workflows etc I’ve tested with are owned by the same user, and the invocation is by the same user. Thanks a lot of your advice.

Ying

1 Like

Hi Ying,

make sure that your workflow contains an input module and please POST to the /api/workflows/<workflow_id>/invocations endpoint. If this is a relatively simple workflow that doesn’t use dataset collections you may be able to generate an example script using https://github.com/mvdbeek/workflow2executable

1 Like

Hello!

thank you so much for the answer. I realized that missing an input dataset step is the cause of the issue. I was a bit confused and misled because even without the input dataset step, i was able to run the workflow through the UI manually, and I’d be asked to specify a input file upon running. However, when one uses API call to run the workflow, there is no input because the input step is missing.

Is there a way to enforce a workflow to have input dataset step, for ex via some Galaxy config? Since all out workflows will be invoked through API (instead of user running manually from Galaxy UI), if we don’t enforce input step when workflow is created, we could run into this issue in production. We can of course add logic in our app to check workflow, but if there is some setting in Galaxy itself to check such, that would be even better.

Thanks!
Ying

I’d love to do this and remove the possibility for adding ad-hoc datasets altogether, but I think this would be a little disruptive to the community. We might add something like a check or verify or common_problems route to the API that would check for this or other sub-optimal things.