There was a problem in creating a paired read collection using API

Hello, I am very anxious, please help, thank you, I am using bioblend to connect to the usegalaxy API, my workflow input is Collection of paired reads.When a paired collection is generated, it is found that there are 2 paireds instead of 1 pair, which causes the input of Collection of paired reads to be unable to connect.Part of my python code is as follows:
dataset_1 = gi.tools.upload_file(Input1_forward, history_id=history_id, file_type=‘fastqsanger.gz’)
dataset_1_id = dataset_1[‘outputs’][0][‘id’]

dataset_2 = gi.tools.upload_file(Input1_reverse, history_id=history_id, file_type=‘fastqsanger.gz’)
dataset_2_id = dataset_2[‘outputs’][0][‘id’]

collection_payload = {
‘collection_type’: ‘list:paired’,
‘name’: ‘Paired Collection’,
‘element_identifiers’: [{‘name’: ‘forward’, ‘src’: ‘hda’, ‘id’: dataset_1_id},
{‘name’: ‘reverse’, ‘src’: ‘hda’, ‘id’: dataset_2_id}]
}

uploaded_collection = gi.histories.create_dataset_collection(history_id, collection_payload)

inputs = {
‘input1’: {‘src’: ‘hdca’, ‘id’: uploaded_collection[‘id’], ‘name’: ‘Collection of paired reads’, ‘format’: ‘fastqsanger.gz’,
‘paired’: True}
}

workflow_id = ‘c0d6e75e9e3f9763’
invocation = gi.workflows.invoke_workflow(workflow_id, inputs=inputs)
gi.workflows.wait_for_invocation(invocation[‘id’])



1289723717fcfde4ecd233a639db8ed

I’m sorry, it’s not quite clear what the problem is you’re encountering.

Hello, I am trying to create a Collection of paired reads using the API. However, when I attempt to create it with a pair of paired-end data, I noticed that it is being uploaded as a list with 2 pairs. As a result, I am unable to proceed with the “Collection of paired reads” step. Could you please assist me in resolving this issue?

You’ve created a list:paired collection, if your workflow input is a list:paired or paired input this should just work. Make sure this works correctly with a manually built collection just to make sure you’ve correctly set the data type and that your workflow input accepts list:paired collections

Thank you for your reply.I use the api of bioblend, and the’ collection_type’ is’ list:paired’. Isn’t it right?Why is the Paired Collection a list with 2 pairs instead of a list with 1 pair after my data is uploaded? My workflow input is Collection of paired reads, but I can’t upload a list with 1 pair. If I use the UI of usegalaxy, I can build A list with 1 pair and upload it to the workflow. Part of my code is as follows:

history_name = 'MetaWin Results'
history_id = gi.histories.create_history(name=history_name)['id']

dataset_1 = gi.tools.upload_file(Input1_forward, history_id=history_id, file_type='fastqsanger.gz')
dataset_1_id = dataset_1['outputs'][0]['id']

dataset_2 = gi.tools.upload_file(Input1_reverse, history_id=history_id, file_type='fastqsanger.gz')
dataset_2_id = dataset_2['outputs'][0]['id']

collection_payload = {
    'collection_type': 'list:paired',
    'name': 'Paired Collection',
    'element_identifiers': [{'name': 'forward', 'src': 'hda', 'id': dataset_1_id},
                            {'name': 'reverse', 'src': 'hda', 'id': dataset_2_id}]
}

uploaded_collection = gi.histories.create_dataset_collection(history_id, collection_payload)

inputs = {
    'input1': {'src': 'hdca', 'id': uploaded_collection['id'], 'name': 'Collection of paired reads', 'format': 'fastqsanger.gz',
               'paired': True}
}

workflow_id = 'c0d6e75e9e3f9763'  
invocation = gi.workflows.invoke_workflow(workflow_id, inputs=inputs)
gi.workflows.wait_for_invocation(invocation['id'])

Oh, I see, it’s cause you’re not providing the elements for a nested collection.

Your collection payload should look something like:

{
    'collection_type': 'list:paired',
    'name': 'Paired Collection',
    'element_identifiers': [
        {
            'collection_type': 'paired',
            'name': 'first_element',
            'src': 'new_collection',
            'element_identifiers': [
                {
                     'name': 'forward', 
                     'src': 'hda',
                     'id': dataset_1_id
                }, {
                    'name': 
                    'reverse',
                    'src': 'hda',
                    'id': dataset_2_id
                }
            ]
        }
    ]
}
2 Likes

Hi!This problem has been solved. Thank you very much.But I encountered another problem, which was displayed at runtime:
Traceback (most recent call last):
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\bioblend\galaxyclient.py”, line 196, in make_post_request
raise ConnectionError(
bioblend.ConnectionError: Unexpected HTTP status code: 400: {“err_msg”: “Workflow cannot be run because input step ‘6952123’ (Collection of paired reads) is not optional and no input provided.”, “err_code”: 0}

Part of my code is as follows:
history_name = ‘MetaWin Results’
history_id = gi.histories.create_history(name=history_name)[‘id’]

dataset_1 = gi.tools.upload_file(Input1_forward, history_id=history_id, file_type='fastqsanger.gz')
dataset_1_id = dataset_1['outputs'][0]['id']

dataset_2 = gi.tools.upload_file(Input1_reverse, history_id=history_id, file_type='fastqsanger.gz')
dataset_2_id = dataset_2['outputs'][0]['id']

collection_payload = {
    'collection_type': 'list:paired',
    'name': 'Paired Collection',
    'element_identifiers': [
        {
            'collection_type': 'paired',
            'name': 'first_element',
            'src': 'new_collection',
            'element_identifiers': [
                {
                    'name': 'forward',
                    'src': 'hda',
                    'id': dataset_1_id
                }, {
                    'name':'reverse',
                    'src': 'hda',
                    'id': dataset_2_id
                }
            ]
        }
    ]
}

uploaded_collection = gi.histories.create_dataset_collection(history_id, collection_payload)

inputs = {
    'input1': {'src': 'hdca', 'id': uploaded_collection['id'], 'name': 'Collection of paired reads', 'format': 'fastqsanger.gz',
               'paired': True}
}

workflow_id = '1aa75989b24978e3'  
invocation = gi.workflows.invoke_workflow(workflow_id, inputs=inputs)
gi.workflows.wait_for_invocation(invocation['id'])

This error occurs at runtime:
Traceback (most recent call last):
File “C:\Users\dongge\PycharmProjects\宏基因组分析\meta.py”, line 83, in
main()
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\gooey\python_bindings\gooey_decorator.py”, line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\dongge\PycharmProjects\宏基因组分析\meta.py”, line 79, in main
invocation = gi.workflows.invoke_workflow(workflow_id, inputs=inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\bioblend\galaxy\workflows_init_.py”, line 494, in invoke_workflow
return self._post(payload, url=url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\bioblend\galaxy\client.py”, line 169, in _post
return self.gi.make_post_request(url, payload=payload, files_attached=files_attached)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\bioblend\galaxyclient.py”, line 196, in make_post_request
raise ConnectionError(
bioblend.ConnectionError: Unexpected HTTP status code: 400: {“err_msg”: “Workflow cannot be run because input step ‘6952123’ (Collection of paired reads) is not optional and no input provided.”, “err_code”: 0}

Furthermore, I encountered another error:

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “C:\Users\dongge\PycharmProjects\宏基因组分析\meta.py”, line 83, in
main()
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\gooey\python_bindings\gooey_decorator.py”, line 134, in
return lambda *args, **kwargs: func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\dongge\PycharmProjects\宏基因组分析\meta.py”, line 42, in main
dataset_1 = gi.tools.upload_file(Input1_forward, history_id=history_id, file_type=‘fastqsanger.gz’)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\bioblend\galaxy\tools_init_.py”, line 497, in upload_file
uploader.upload()
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\tusclient\uploader\uploader.py”, line 45, in upload
self.upload_chunk()
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\tusclient\uploader\uploader.py”, line 59, in upload_chunk
self._do_request()
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\tusclient\uploader\uploader.py”, line 88, in _do_request
self._retry_or_cry(error)
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\tusclient\uploader\uploader.py”, line 102, in _retry_or_cry
raise error
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\tusclient\uploader\uploader.py”, line 85, in _do_request
self.request.perform()
File “C:\Users\dongge\AppData\Local\Programs\Python\Python311\Lib\site-packages\tusclient\request.py”, line 92, in perform
raise TusUploadFailed(error)
tusclient.exceptions.TusUploadFailed: HTTPSConnectionPool(host=‘usegalaxy.org’, port=443): Max retries exceeded with url: /api/upload/resumable_upload/f354effc64c89aa22689da415ea96f78 (Caused by SSLError(SSLEOFError(8, ‘EOF occurred in violation of protocol (_ssl.c:2423)’)))

Please have a look at API documentation for interacting with Galaxy — BioBlend 1.2.0 documentation, you’re providing an input for input1, which is not the step index.
It should probably be

inputs = {
    'Collection of paired reads': {'src': 'hdca', 'id': uploaded_collection['id']}
}

and then set inputs_by to label.

I’d also recommend taking a look at Running Galaxy workflows — Planemo 0.75.20 documentation which provides this functionality at a higher level.

Hi!That’s not right either。My workflow only has one input: Collection of paired reads.

dataset_1 = gi.tools.upload_file(Input1_forward, history_id=history_id, file_type=‘fastqsanger.gz’)
dataset_1_id = dataset_1[‘outputs’][0][‘id’]

dataset_2 = gi.tools.upload_file(Input1_reverse, history_id=history_id, file_type='fastqsanger.gz')
dataset_2_id = dataset_2['outputs'][0]['id']

collection_payload = {
    'collection_type': 'list:paired',
    'name': 'Paired Collection',
    'element_identifiers': [
        {
            'collection_type': 'paired',
            'name': 'first_element',
            'src': 'new_collection',
            'element_identifiers': [
                {
                    'name': 'forward',
                    'src': 'hda',
                    'id': dataset_1_id
                }, {
                    'name':'reverse',
                    'src': 'hda',
                    'id': dataset_2_id
                }
            ]
        }
    ]
}

uploaded_collection = gi.histories.create_dataset_collection(history_id, collection_payload)
inputs = {
    'Collection of paired reads': {'src': 'hdca', 'id': uploaded_collection['id'], 'inputs_by': 'Collection of paired reads'}
}

workflow_id = '1aa75989b24978e3'
invocation = gi.workflows.invoke_workflow(workflow_id, inputs=inputs)
gi.workflows.wait_for_invocation(invocation['id'])

If you compare this with the documentation you’ll see that you’re not setting the parameters correctly.

inputs = {
    'Collection of paired reads': {'src': 'hdca', 'id': uploaded_collection['id']}
}

workflow_id = '1aa75989b24978e3'
invocation = gi.workflows.invoke_workflow(workflow_id, inputs=inputs, inputs_by="name")
2 Likes

Oh,thank you for your reply.God bless you.I successfully run the workflow. I want to wait until the workflow is finished and my project is finished.But I found that gi. workflows.wait_for_invocation doesn’t seem to exist.

What I’d suggest is first polling using show_invocation until the state is scheduled, and then wait for the job states using API documentation for interacting with Galaxy — BioBlend 1.2.0 documentation and passing the invocation_id to filter for jobs belonging to your invocation.

What should I do if I want to download the results generated by workflow to the local area? Do you have any good suggestions? What should my code example be?

You can look at the outputs of your invocation (as shown by show_invocation) and download them as you would any other dataset, or you could trigger an invocation export. Still, you seem to be replicating planemo run functionality, I’d really recommend that you look into this.

Hi!I’m sorry to bother you again. It’s true that I have a problem.I tried to use this api document.But not detailed enough.I only saw show _ invocation (invocation _ id: str) → dict [str, any].There is still an error in my code.Can you help me revise it? Or be more specific?Thank you very much indeed.Here is my code:

def poll_invocation(invocation_id):
while True:
invocation_details = gi.invocations.show_invocation(invocation_id)
invocation_state = invocation_details[‘state’]
if invocation_state == ‘scheduled’:
break
time.sleep(5) # Wait for 5 seconds before polling again

def wait_for_jobs(invocation_id):
while True:
jobs = gi.invocations.get_invocation_jobs(invocation_id)
all_jobs_completed = all(job[‘state’] == ‘ok’ for job in jobs)
if all_jobs_completed:
break
time.sleep(5) # Wait for 5 seconds before checking again

invocation_id = invocation['id']

# Poll the invocation status until it is scheduled
poll_invocation(invocation_id)

# Wait for all jobs to complete
wait_for_jobs(invocation_id)

# Get the workflow invocation details
invocation_details = gi.invocations.show_invocation(invocation_id)

# Retrieve the output datasets
output_datasets = invocation_details['output_datasets']

# Download the output datasets
for dataset in output_datasets:
    dataset_id = dataset['id']
    dataset_name = dataset['name']
    output_path = os.path.join(output_dir, dataset_name)
    gi.datasets.download_dataset(dataset_id, file_path=output_path)