Hi @H1889
This issue was unrelated to the problems during the Sept 27-29th time frame.
As a cross test last week, I attempted to run your data against the clusters hosted at UseGalaxy.eu. That job also fails, and for the same reason. However, if I run Helixer at EU on the original genome fasta, that Genome Annotation job is successful.
I just took a closer look at your gff3 data, and think I isolated the specific immediate problem. More may be going on, but getting over this issue is where to start. I’ll walk through the details here.
Galaxy command line (find this on the job’s Details view using the i-icon)
ln -s ‘/corral4/main/objects/f/d/f/dataset_fdfa69e1-e5b7-4069-a24f-d58c2b621872.dat’ ‘input.gff’ && python -m jcvi.annotation.stats genestats ‘input.gff’ > ‘/corral4/main/jobs/070/955/70955718/outputs/dataset_7390b3aa-e935-44a9-807a-fd53bd79f86b.dat’ && python -m jcvi.annotation.stats summary ‘input.gff’ ‘/corral4/main/objects/1/1/6/dataset_11639a65-cd34-4cf9-bd7d-b8cf60bc2d03.dat’ 2>&1 | tail -n +3 >> ‘/corral4/main/jobs/070/955/70955718/outputs/dataset_7390b3aa-e935-44a9-807a-fd53bd79f86b.dat’ && python -m jcvi.annotation.stats stats ‘input.gff’ 2>&1 | grep Mean >> ‘/corral4/main/jobs/070/955/70955718/outputs/dataset_7390b3aa-e935-44a9-807a-fd53bd79f86b.dat’ && python -m jcvi.annotation.stats histogram ‘input.gff’ && pdfunite *.input.pdf ‘/corral4/main/jobs/070/955/70955718/outputs/dataset_54e17c40-773c-433f-a65e-61a0765944da.dat’
The base tool is hosted here → GitHub - tanghaibao/jcvi: Python library to facilitate genome assembly, annotation, and comparative genomics
The first module, jcvi.annotation.stats genestats, is triggering the error → jcvi/src/jcvi/annotation/stats.py at 1d66cac3a43a5042ccd8d7998a21131cadcb427e · tanghaibao/jcvi · GitHub
e[0;33m09:05:02 [gff]e[0me[0;35m Indexing input.gffe[0m
e[0;33m09:05:22 [base]e[0me[0;35m Load file transcript.sizese[0m
e[0;33m09:05:22 [base]e[0me[0;35m Imported 11653 records from transcript.sizes.e[0m
e[0;33m09:05:22 [base]e[0me[0;35m Load file transcript.sizese[0m
e[0;33m09:05:22 [base]e[0me[0;35m Imported 11653 records from transcript.sizes.e[0m
e[0;33m09:05:22 [stats]e[0me[0;35m A total of 11653 transcripts populated.e[0m
Traceback (most recent call last):
File “/usr/local/lib/python2.7/runpy.py”, line 174, in _run_module_as_main
“main”, fname, loader, pkg_name)
File “/usr/local/lib/python2.7/runpy.py”, line 72, in _run_code
exec code in run_globals
File “/usr/local/lib/python2.7/site-packages/jcvi/annotation/stats.py”, line 355, in
main()
File “/usr/local/lib/python2.7/site-packages/jcvi/annotation/stats.py”, line 56, in main
p.dispatch(globals())
File “/usr/local/lib/python2.7/site-packages/jcvi/apps/base.py”, line 96, in dispatch
globalsaction
File “/usr/local/lib/python2.7/site-packages/jcvi/annotation/stats.py”, line 176, in genestats
conf_class = conf_classes[transcripts[0]]
IndexError: list index out of range
What is happening:
- The tool is first counting up the number of mRNA features and generating some stats (lengths).
- Next, it is reviewing the gene features and exon features to reconcile against the mRNA features then generate a few more statisics.
- However, your gff3 file contains gene feature blocks like this
| Seqid |
Source |
Type |
Start |
End |
Score |
Strand |
Phase |
Attributes |
| contig_1 |
funannotate |
gene |
426694 |
426765 |
. |
- |
. |
ID=ASPNIG_000108; |
| contig_1 |
funannotate |
tRNA |
426694 |
426765 |
. |
- |
. |
ID=ASPNIG_000108-T1;Parent=ASPNIG_000108;product=tRNA-Ala; |
| contig_1 |
funannotate |
exon |
426694 |
426765 |
. |
- |
. |
ID=ASPNIG_000108-T1.exon1;Parent=ASPNIG_000108-T1; |
- This is confusing the tool, and it is failing. It would fail anywhere with this input.
What to do
Removing these tRNA lines (all associated features – gene, tRNA, exon) will avoid the immediate problem.
You could also go into a Jupyter Notebook, load the package, and run these tools directly (all modules). Moving data out of and back into a Galaxy history is part of this.
GTN tutorials for Jupyter Notebook.
This is the reformat module. It doesn’t have a Funannotate specific conversion but maybe it is useful anyway? To see what is expected? → jcvi/src/jcvi/annotation/reformat.py at 1d66cac3a43a5042ccd8d7998a21131cadcb427e · tanghaibao/jcvi · GitHub
I made a request to see if it could be wrapped for Galaxy since it does more than just the tRNA reformatting, although to make it useful for your specific data, you may want to try the tRNAscan module instead! → Request: wrap jcvi_gff_stats reformat.py as a standalone tool + add as a preprocessing option to jcvi_gff_stats (*) · Issue #7317 · galaxyproject/tools-iuc · GitHub
There is also another tool package that uses the reformat.py script (all these utilities are nested!) that you may find interesting. See in the tool panel at EU → Fix tRNA model. It parses the output of tRNA prediction (tRNAscan) and tRNA and tmRNA prediction (Aragorn). All of these tools use ever so slightly different gff3 formats but hopefully explain more about what to look for if an error comes up again.
So, try reformatting with other text manipulation tools and consider comparing with Jupyter and the expanded package directly to learn what these tools are expecting.
I hope this helps you to understand what is going on! Please let us know if you have any questions! 