Hi, I’m trying to use PanTA, Panaroo, and Roary with the dataset provided by the PanTA paper (Efficient inference of large prokaryotic pangenomes with PanTA). I took the three species files (a tar.gz file) and converted it to a .zip file to upload it to Galaxy.eu. There, I unzipped it using the ‘Unzip a file’ tool, and then I changed the data type of the resulting collection to GFF3.
I am able to pass this collection (with 600 GFF3 files) to each tool, but PanTA and Panaroo don’t work as expected. Panaroo outputs an empty list with 0 datasets, and PanTA says that there must be at least 2 samples, even though the input has about 600 GFF3 files.
Here is the link to the history I am using to test this: Galaxy
I would appreciate it if anyone could help me with this issue, as I am new to Galaxy.
Thanks in advance,
Fernando Martin Garcia
admin edit: fail - "a list with 0 datasets " & “Exception: There must be at least 2 samples” - Inputs not recognized?
I can’t see how the jobs were set up (they weren’t fully shared) but I can see the input collection (both copies). I wondering if the sample names (collection element identifiers) were truncated by the tools due to the format.
I’ve reorganized the samples with group tags (for the batch groups), and simplified element identifiers (sample names), plus pushed the full original name into a general purpose tag (optional “name” tag – you could modify this to be something else).
I’ve done that for the full batch, then created a downsample set for you to test with (5 samples).
Would you like to run some tests to see if the reorganization was enough to resolve the immediate sample interpretation error? I didn’t notice anything else special about the gff3 files – these look Ok to me from a first pass.
Then, if the job(s) fails again, and you don’t want to share the actual datasets in the history, please capture a screenshot of the job Details view so I can see the tool input/parameter configuration and the full job logs to replicate the job and examine the technical details. We can pull in the EU administrators if this is some server issue but I don’t think that is needed yet.
How to use this history: you can review or you can import to see what I’ve done with the Apply Rules tool and a few other Collection operations manipulations. Remember that you can copy datasets between histories (the top level collection automatically pulls in the elements/files), and use the rerun icon to bring up the original tool form, change the input, and run it yourself on any other subsets you want to try if the downsampling/tagging is interesting to you (and it works!). The icons for collections are at the top after clicking into the collection. Then purge anything not needed once done. Tutorials for these tools are linked at the bottom of the forms but please ask if anything is not clear and I can point you to specific resources.
Great! Let’s start there.
Update: I found one of the older jobs in the deleted set. I’ve started up a test run with the downsample reorganized collection to see what happens! All is in the same history above.
Yes, the issue was with the collection identifiers. The last test in the shared history is still waiting to run but I’m expecting it to be successful, or to at least present with a different error (content, not format).
The exact process will be different for everyone, but the instructions above (extracting and directly applying new identifiers for those who prefer text manipulation utilities) and the examples in the shared history above (using another method) can get you started.
Collection 4331 with the rerun icon to bring up the Apply Rules tool
Thank you very much for the quick and precise reply!
It works now and both Panaroo and PanTA seem to be working fine with the new identifiers, although I tend to do it a bit differently and wanted to ask if this approach is also okay for changing the identifiers?