Getting high number of pseudogenes

I have done whole genome assembly of nanopore reads with flye and annotation using Prokka. However i got 30x coverage assembly file, it’s containing more than 2k pseudogenes. Can anyone suggest a tool to remove pseudogenes or frameshifts in my assembly file? Do i need to polish assembly? Any suggestions please…

Welcome, @l.r.l.s.k

Yes, you will probably need to do this, or backup and do more QA before assembly.

Tutorials with examples are here → Assembly / Tutorial List

likely start here → Hands-on: Genome Assembly of MRSA from Oxford Nanopore MinION data (and optionally Illumina data) / Assembly

And annotation tutorials are here → Genome Annotation / Tutorial List

Any questions about those methods we can followup about.

Let’s start there! :slight_smile: