I have a output collection from a tool (antiSMASH) that contains files in gene bank format. The files are with the extension “.genebank”. I want to supply these file as input to another tool (BIG-SCAPE). However, it accepts files with “.gbk” extension only.
When provided with the collection with .genebank names the tool (BIG-SCAPE) generates a output but does not recognize the files. I downloaded the collection, edited the extenbsion to .gbk manually and uploaded back again. This time the output from BIG-SCAPE was alright.
Which tool can I use to automate the renaming of the extension for the individual files in a collection? .gbk option is not available in the convent or datatype option for the collection. Thank you.
There isn’t a batch way to rename the dataset name since it isn’t supposed to matter. Tools interpret the datatype instead, and you have full control over that.
If Galaxy guessed the datatype wrong because of a file extension (usually during Upload), you can directly update to the correct one, and you can do that in batch on a collection. Just make sure the data content is actually a match for the datatype.
A tool should definitely assign the datatype of outputs correctly, no matter what they are named, and if it doesn’t, that would be a bug to fix and we can confirm the problem and report it.
I know if only one tool that does interpret the actual dataset name (for reasons internal to the underlying tool that can’t be worked around).
If you determine that BIG-SCAPE is actually interpreting the dataset file name, or that Antismash is not assigning the correct datatype to outputs, I would be curious and could review an example in a shared history. We could ask the developers about it.
Please review the above and see if it solves the problem or share more details please. Thanks!
Unfortunately, the batch chaning datatype does not have the option to convert it to .gbk extension. Hence, I cannot use that.
The output from antiSMASH is in genebank format but with the extension “.genebank” . The BiG-SCAPE tool needs genebank files as input but in .gbk format. It does take the .genebank files and produce the output but it is blank.
The problem could be solved in two ways. 1) Either the antiSMASH output is directly in .gbk format (Only the naming convention, the content is ok). 2) BIG-SCAPE should be able to process .genebank files as well as .gbk files.
First input has both the genbank datatype and a file name that ends with .gbk
Second input has the genbank datatype but a file name that doesn’t end with .gbk
Let’s see what happens. If you want to check I have that set up right, that would be great too. A small example is helpful to the developers if this needs a change.
Update
I could reproduce the problem. Yes, the tool requires inputs like
filename.gbk
and fails with
filename
even when the correct datatype (genbank) is assigned.
The tool form does use the datatype to detect inputs in select lists. So, both the correct filename.gbk and datatype genbank are needed for a successful run.
Since the change might take a while to implement, I wanted to suggest a way to handle this inside Galaxy.
General path
Put an “inputs”, antiSMASH and BiG-SCAPE into a workflow
Use the function to rename the antiSMASH output dataset, and adjust for the required extension
You could even just put antiSMASH (along with an “inputs” – required for all workflows) into a workflow by itself to do this… but might as well stream the two together.
I should have mentioned this to start with. Sorry, was focusing on clarifying the root issue and direct usage, not workflow usage. In short, the whole download/rename/upload process could be skipped when using a workflow.
Screenshot of the function. Click on the target tool, then see the side panel. All of the options on the regular tool form will be there, plus a few workflow-specific options.
The “filename” in a collection is a different attribute. Those are called the Element Identifiers, and sort of works like a file name. I just tested to see if modifying those was enough to get the tool to accept the renaming, and it did.
Instead of typing out how to do this, I’m going to share some Galaxy artifacts.
Then I created a small workflow with example manipulation steps, along with a rerun, and what resulted. This is what you can add to your workflow, or you can adapt mine.
Many more text manipulations tools are in these tutorials https://training.galaxyproject.org/training-material/search2?query=olympics and the workflow manipulations are in the workflow tutorials, and you can see the function used in other contexts in at least two other tutorials (see the bottom of the Relabel identifiers tool form).
Thanks a lot for the solution. It worked perfectly. I tested it with the small dataset as well as with the larger files. It works well and has produced satisfactory results.