Renaming files in a collection for BiG-SPAPE to avoid errors

Shreyash_04 · May 16, 2024, 10:52am

Dear Galaxy community,

I have a output collection from a tool (antiSMASH) that contains files in gene bank format. The files are with the extension “.genebank”. I want to supply these file as input to another tool (BIG-SCAPE). However, it accepts files with “.gbk” extension only.

When provided with the collection with .genebank names the tool (BIG-SCAPE) generates a output but does not recognize the files. I downloaded the collection, edited the extenbsion to .gbk manually and uploaded back again. This time the output from BIG-SCAPE was alright.

Which tool can I use to automate the renaming of the extension for the individual files in a collection? .gbk option is not available in the convent or datatype option for the collection. Thank you.

jennaj · May 16, 2024, 6:43pm

Hi @Shreyash_04

Instead, updating the datatype should be enough. But you can share more details if I am misunderstanding.

See the second FAQ here for the how-to for batch changing the datatype of a collection folder of dataset files → https://training.galaxyproject.org/training-material/faqs/galaxy/#collections

There isn’t a batch way to rename the dataset name since it isn’t supposed to matter. Tools interpret the datatype instead, and you have full control over that.

Walking through this for clarity.

One tool is expecting one of these →

And outputs these →

Then the next tool is expecting one of these →

If Galaxy guessed the datatype wrong because of a file extension (usually during Upload), you can directly update to the correct one, and you can do that in batch on a collection. Just make sure the data content is actually a match for the datatype.

A tool should definitely assign the datatype of outputs correctly, no matter what they are named, and if it doesn’t, that would be a bug to fix and we can confirm the problem and report it.

I know if only one tool that does interpret the actual dataset name (for reasons internal to the underlying tool that can’t be worked around).

If you determine that BIG-SCAPE is actually interpreting the dataset file name, or that Antismash is not assigning the correct datatype to outputs, I would be curious and could review an example in a shared history. We could ask the developers about it.

Please review the above and see if it solves the problem or share more details please. Thanks!

Shreyash_04 · May 17, 2024, 7:27am

Hi Jennifer,

Thank you for the detailed reply.

Unfortunately, the batch chaning datatype does not have the option to convert it to .gbk extension. Hence, I cannot use that.

The output from antiSMASH is in genebank format but with the extension “.genebank” . The BiG-SCAPE tool needs genebank files as input but in .gbk format. It does take the .genebank files and produce the output but it is blank.

The antiSMASH output.

The BIG-SCAPE output with the .genebank input.

The number of genomes detected is 0, that means it did not recognize the input itself.

Then the collection 310 was downloaded and renamed manually to have the .gbk extension and uploaded back to galaxy.

The BIG-SCAPE produced a output with recognizing the gbk files.

The problem could be solved in two ways. 1) Either the antiSMASH output is directly in .gbk format (Only the naming convention, the content is ok). 2) BIG-SCAPE should be able to process .genebank files as well as .gbk files.

Please let me know shall you need more details.

Kind regards
Shreyash

jennaj · May 17, 2024, 4:48pm

Hi @Shreyash_04

Thanks for posting those details! Very helpful.

I’ve started up a very simple history here to test to see if this can be reproduced.
https://usegalaxy.eu/u/jenj/h/test-bigscape

First input has both the genbank datatype and a file name that ends with .gbk
Second input has the genbank datatype but a file name that doesn’t end with .gbk

Let’s see what happens. If you want to check I have that set up right, that would be great too. A small example is helpful to the developers if this needs a change.

Update

I could reproduce the problem. Yes, the tool requires inputs like

filename.gbk

and fails with

filename

even when the correct datatype (genbank) is assigned.

The tool form does use the datatype to detect inputs in select lists. So, both the correct filename.gbk and datatype genbank are needed for a successful run.

I’ve opened a ticket at the IUC to see what can be done. Please feel free to add more details or comments. Enhancement: allow BiG-SCAPE to process inputs without a required .gbk extension in dataset filename · Issue #6015 · galaxyproject/tools-iuc · GitHub

Thanks for all the followup!!

Shreyash_04 · May 17, 2024, 9:02pm

Hi Jennifer,

Thank you for creating a test history and confirming the problem. I also appreciate your efforts for raising the issue on the GitHub.

I will follow the updates on GitHub.

jennaj · May 21, 2024, 5:20pm

Hi @Shreyash_04

Since the change might take a while to implement, I wanted to suggest a way to handle this inside Galaxy.

General path

Put an “inputs”, antiSMASH and BiG-SCAPE into a workflow
Use the function to rename the antiSMASH output dataset, and adjust for the required extension

You could even just put antiSMASH (along with an “inputs” – required for all workflows) into a workflow by itself to do this… but might as well stream the two together.

I should have mentioned this to start with. Sorry, was focusing on clarifying the root issue and direct usage, not workflow usage. In short, the whole download/rename/upload process could be skipped when using a workflow.

We have many workflow tutorials if this is new to you. Maybe start here → Hands-on: Creating, Editing and Importing Galaxy Workflows / Using Galaxy and Managing your Data

Screenshot of the function. Click on the target tool, then see the side panel. All of the options on the regular tool form will be there, plus a few workflow-specific options.

Hope this helps!

Shreyash_04 · May 22, 2024, 8:12am

Hi @jennaj

Thanks a lot for the advice.

I attempted to perform the dummy analysis using a workflow with antiSMASH and BiG/SCAPE.

I attempted to change the datatype to .gbk in the configure output for genebank files of antiSMASH. However .gbk is not available as a option.

The rename function also did not workout. The main Output collection was renamed while the files inside the collection remained unchanged.

The analysis in the workflow did not work either. The BiG/SCAPE job failed.

One of the following may help me.

Is there a function or tool to rename files in a collection using a formula. (Where only the extension is changed)
Can we have the .gbk filetype made available in the convert datatype section of the collections or files.

Thank you for taking time to help me.

Best regards
Shreyash

jennaj · May 22, 2024, 10:45pm

There are a few parts here so I’m going to number

Dataype

For this part

I should have been more clear here

Name the file

filename.gbk

and assign (or leave?)

datatype assigned as genbank.

“Filename” of the inputs

Correct, sorry.

The “filename” in a collection is a different attribute. Those are called the Element Identifiers, and sort of works like a file name. I just tested to see if modifying those was enough to get the tool to accept the renaming, and it did.

Instead of typing out how to do this, I’m going to share some Galaxy artifacts.

original testing history https://usegalaxy.eu/u/jenj/h/test-bigscape

Then I created a small workflow with example manipulation steps, along with a rerun, and what resulted. This is what you can add to your workflow, or you can adapt mine.

workflow https://usegalaxy.eu/u/jenj/w/relabel-test-bigscape
using the original history as a source of inputs to the workflow, this is the output sent to a new history https://usegalaxy.eu/u/jenj/h/relabel-test-bigscape

Many more text manipulations tools are in these tutorials https://training.galaxyproject.org/training-material/search2?query=olympics and the workflow manipulations are in the workflow tutorials, and you can see the function used in other contexts in at least two other tutorials (see the bottom of the Relabel identifiers tool form).

Hope this is the solution but please review and try it out

Shreyash_04 · May 24, 2024, 7:44am

Hi @jennaj ,

Thanks a lot for the solution. It worked perfectly. I tested it with the small dataset as well as with the larger files. It works well and has produced satisfactory results.

Thank a lot.

jennaj · May 24, 2024, 7:21pm

Very happy to hear that! Happy science!

Topic		Replies	Views
BiG-SCAPE issue usegalaxy.eu support tool-help , bigscape	1	7	July 11, 2025
Getting files from history download with proper sample names usegalaxy.eu support history , collections , __apply_rules__	4	520	December 7, 2022
Change the dataset type of all elements of a collection? fastqsanger , collections , datatype	6	745	February 18, 2022
Data collection not available as input tool-dev , metagenomics , collections , variant-analysis , mothur	21	1489	July 13, 2021
BiG-SCAPE job submission failed usegalaxy.eu support bigscape	3	57	October 9, 2024

Renaming files in a collection for BiG-SPAPE to avoid errors

Related topics