Panaroo: GFF files in a Collection? Toggle option to retain file extensions in element sample identifiers

My files of bacterial genomes are gff type, whose names all end with .gff. But I can’t run Panaroo with them. How can I run Panaroo with them? Thank you so much.

Welcome @Ph_c_Mai

Hopefully we can help!

I’m guessing that your files are in a collection, correct? If so, then the collection identifiers need to have the .gff extension added on.

Panaroo is a bit special in that it is using the file name for the sample labels, and it is hard coded inside of the original tool to expect the extension! There is a request in to make this work differently but that is still pending implementation.

We have an example in this topic showing how to manipulate your collection identifiers. → Panaroo & PanTA: sample recognition troubleshooting - #3 by jennaj

This can be a bit complicated the first time! Please give this a try, and if you get stuck you are welcome to share your history. We can help you to make the adjustment, then share it back, and then you’ll have a template manipulation to guide you.

Or, if I guessed wrong, you are still welcome to share your history or possibly screenshots would be enough. Capture the full page of the job details page for the error (expand the logs too!) and then a few of the input datasets expanded, plus the output for the Extract element identifiers tool if the data is in a collection.

So, you have some choices! Let us know! :slight_smile:

Thanks, Jenna.

I’m not good at bioinformatics and regard myself as low-tech. Please kindly correct if I’m wrong.

I tested the first 11 bacterial genomes, whose files all are .gff extension. However, Panaroo couldn’t run. Here is the screen capture.

I’m looking forward to your reply, Jenna.

Hi @Ph_c_Mai,

It seems you have gff files in history, but Panaroo takes a collection of gff files. It does not “see” files, only collection, at least, the current version.

Maybe check this FAQ: Creating a dataset collection

Create a collection (Flat list) via Advanced settings and do not remove file extensions.

Hope that helps.

Kind regards,

Igor

Hi Igor,

I had tried but not successful, then I used ROARY instead.

Thank you so much for your valuable support.

Hi @Ph_c_Mai,

If you share your history with gff files, I can check why collection does not work with Panaroo. History sharing: click history option (three horizontal bars icon) in the top right corner of the history panel > Share & Manage Access > in the middle window change Make History Accessible switch > copy and paste the URL into reply.

Kind regards,

Igor

HI @Ph_c_Mai

The advice from @igor is how we can help you. My guess is that the collection’s element identifiers had the .gff portion of the sample names removed when the collection was created.

Some details are included in this topic. → Panaroo error with GFF3 input TIP: include .gff in the file names and collection identifiers

I’ll also clarify here. The option in the collection building tool to remove the extention is toggled to Yes by default, but it can be set to No to retain it.

  1. If your files are named in a format like:

    sample1.gff
    sample2.gff
    moresamples3.gff

  2. Then create a List Collection with this option set to retain the full file names.

    Default Yes Remove file extensions?

    Toggle to No for data inputs to Panaroo


If you want to give this another try and let us know, we would be interested! Or, maybe this will help the next person. Panaroo is a bit special and the collection creation is highly specific. Most other tools will not have the same requirement (defaults are fine for most use cases).

Thanks! :slight_smile:

Thanks Jenna and Igor.

I will have a try this weekend, sorry for not replying soon due to overburdening work at the company.

Dear Igor and Jenna,

here is my URL: Galaxy

Thanks for your valuable support.

I’m so sorry, Jenna.

I couldn’t run because Panaroo requires the collection. However, after creating the collection, Panaroo runs but with errors.

Here is my URL: Galaxy

Thank you for your patience and support.

Hi @Ph_c_Mai

I tested Panaroo on Galaxy Europe and don’t see any issue with it, but it seems it cannot handle your data. The error message says:

ValueError: Invalid gene sequence!

Were the files produced by Prokka? Panaroo was designed for Prokka data, and it might not handle all gff files produced by other software.

Kind regards,

Igor