Extract list of genes for annotated contig

Hi,

I processed E. coli genome using standard pipeline (Unicycler) and annotated it using Prokka. Pasted below are first few lines from GFF file:
##gff-version 3
##sequence-region 1 1 2860445
##sequence-region 2 1 1613318
##sequence-region 3 1 142357
##sequence-region 4 1 17570
##sequence-region 5 1 3029
##sequence-region 6 1 1797
##sequence-region 7 1 563
##sequence-region 8 1 331
##sequence-region 9 1 224

My colleague needs a list of genes on 3rd contig (i.e., sequence-region 3). How should I proceed? My idea is to select all rows that belong to 3rd contig from GFF file (using filter “Seqname == 3”), extract identifier (e.g., “FADBEKOL_00002”) from last, “Group”, field and join output with TSV file from Prokka.

Any other idea is greatly appreciated.

Best, Andrej

2026 Update! Any text data can be parsed! Search under Text Manipulation for GFF-data specific options or try one of these → :graduation_cap: Hands-on: Data Manipulation Olympics / Data Manipulation Olympics / Introduction to Galaxy Analyses