Extract list of genes for annotated contig

andrej · March 18, 2019, 12:37pm

Hi,

I processed E. coli genome using standard pipeline (Unicycler) and annotated it using Prokka. Pasted below are first few lines from GFF file:
##gff-version 3
##sequence-region 1 1 2860445
##sequence-region 2 1 1613318
##sequence-region 3 1 142357
##sequence-region 4 1 17570
##sequence-region 5 1 3029
##sequence-region 6 1 1797
##sequence-region 7 1 563
##sequence-region 8 1 331
##sequence-region 9 1 224

My colleague needs a list of genes on 3rd contig (i.e., sequence-region 3). How should I proceed? My idea is to select all rows that belong to 3rd contig from GFF file (using filter “Seqname == 3”), extract identifier (e.g., “FADBEKOL_00002”) from last, “Group”, field and join output with TSV file from Prokka.

Any other idea is greatly appreciated.

Best, Andrej

Topic		Replies	Views
Annotate Genomic coordinates/ regions usegalaxy.eu support	3	784	June 21, 2021
Extracting portion of fasta sequences from a multifasta file having contigs names and start-stop positions usegalaxy.org support	0	436	February 24, 2022
Gene ID's being filtered out in ChipSeq analysis usegalaxy.org.au support chip-seq , data-manipulation , reference-annotation , tool-help , get_flanks1	5	24	November 4, 2024
File GTF em file fasta: Extracting fasta sequences based on coordinates (BED/bedGraph/GFF/VCF/EncodePeak file) usegalaxy.org support bedtools , variant-analysis	4	822	October 19, 2021
Filtering by region using a list of contigs usegalaxy.org support tool-help , bcftools_filter	3	31	March 5, 2025

Extract list of genes for annotated contig

Related topics