I am trying to use BiG-SCAPE on the galaxy.eu server. When I upload antiSMASH results for entire genomes, in gbk format, the tool runs successfully, but the results do not make sense. Despite many BGCs in the genomes, the tool only detects like 1 BGC and 1 BGC family from each genome. Has anyone been able to run this tool on gbks of entire antiSMASH outputs?
That will take some time to run. Maybe you can notice where the problem is in this simple case, or notice something technical between your antiSMASH results and this data? We can dig into this more, this is just to get us started.
If you have a smaller test example to share back, too, that would also help. How to share work is explained in the banner topic of this forum. Smaller yet representative is best.
Update: I decided to pull the original genome sequences for those same genomes and run AntiSMASH on them in Galaxy. I’ll then run Big-scape on those results. This will provide a direct comparison between public known annotation versus predicted when run through these two tools.
I am having the exact same issue. Many BGC’s detected by antiSMASH in each of my genomes, and only 1 per genome and 1 per family predicted by BiG-SCAPE in my results.
Did you adjust your anchoring set of PFam domains? You can test those by specifically asking the tool to consider one or more that resulted from the upstream tool. Adjusting other parameters can help as well. We won’t be able to help with the paramaters here too much, but the guide can be found at the developer’s site:
The Galaxy implementation is still under experimentation, and only available at a single server, but it should still work “about” the same! I added some more to the original testing history above with some of more defaults plus a custom anchor file (with a text PF query against the raw files as a comparison for scope). The failures in the history are all due to parameter issues plus sparse-content issues.
My running idea list: I think the tool form could make the significance of the anchors much more important – maybe moved to the top of the form – since the output scope will always rely on this. Other ideas… detailed logs should be output by default, the HMM reference could be pre–loaded as a native index, more items could be warnings – or local to a sample within a larger collection grouping, and the MIBiG database use should probably be a on-by-default requirement (unless the HMM database input minimum changes).
If you have ideas, you are welcome to drop them into this forum topic or, more direct, you could open an issue ticket at the IUC repository for your “nice-to-have feedback” list. Working with scientists to tune tools is really important for our project!