Another issue with Genome Annotation Statistics

When I run Genome Annotation Statistics, I’m getting a weird error, similar to what a user posted last Sep. Only the middle chunk of the stats isn’t showing up in the results, and instead I get this error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/local/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/lib/python2.7/site-packages/jcvi/annotation/stats.py", line 355, in <module>
    main()
  File "/usr/local/lib/python2.7/site-packages/jcvi/annotation/stats.py", line 56, in main
    p.dispatch(globals())
  File "/usr/local/lib/python2.7/site-packages/jcvi/apps/base.py", line 96, in dispatch
    globals[action](sys.argv[2:])
  File "/usr/local/lib/python2.7/site-packages/jcvi/annotation/stats.py", line 93, in summary
    fseq = s.sequence({'chr': f.chrom, 'start': f.start, 'stop': f.stop})
  File "/usr/local/lib/python2.7/site-packages/jcvi/formats/fasta.py", line 156, in sequence
    (f, self.filename)
AssertionError: feature: {'start': 45, 'chr': '616', 'stop': 1319} not in `/data/dnb12/galaxy_db/files/8/e/9/dataset_8e987715-334c-4e9f-b1f8-d36862895ebc.dat`

I’ve run Genome Annotation Statistics (jcvi) with the same gff3 before and it worked fine, but now I’ve replaced the assembly fasta with the cleaned fasta output of NCBI FCS GX. I don’t understand why this is a problem.

Hi @jaredbernard

The tool is reporting a conflict when counting up the features. Would you like to share back the history? I’ll need to see the full fasta and gff3 to diagnose correctly (as I did for the prior case).

In your case, if the gff3 worked with one version of the fasta and not another, I would start with making sure that fasta is in a very simple format: no description on the title line (less important for this but I do it as a default) and a consistent line wrapping length (80 bases).

Then, I would explore a comparison between the two. Now, the error may be spurious, but I would still start with those coordinates. And possibly the feature immediately before this one in the gff3.

Example: does this region exist in the cleaned version of your fasta? Then, as a cross check, what does the gff3 look like in that same region?

AssertionError: feature: {‘start’: 45, ‘chr’: ‘616’, ‘stop’: 1319}`

To try to solve any minor format issue, I would probably run both through gffread since it can synch up bases and annotation when both are included.

Finally, I would review the content of the gff3 versus the user guide and consider some of the other options I suggested in the other topic. The reformatting module that comes with the base tool is still waiting for a developer to work on it but you could try to run that in a notebook.

Those are my best guesses without seeing the inputs/outputs! Let us know if we can help more! :slight_smile: