Custom reference genome annotation

Hello,

I have the complete genome sequence (fasta format) from NCBI for my reference genome, but no access to any GTF/GFF3 file (it is a phage identified in our lab). I normalized the fasta file using NormalizeFasta. My understanding is that this is sufficient for the bwa-mem step, but I am a little unsure how to create my own gtf/gff3 file for downstream counting step (planning on using featurecounts or htseq). Any advice would be appreciated.
Alexa

2 Likes

Hi @AlexaDean

Please see the GTN tutorials here:

and maybe here, too:

Hello,

Sorry I think I was unclear. The genome in NCBI has been annotated (there is an associated GenBank file). Let me be more clear: I downloaded just the fasta file of the genome. For bwa-mem, I input the custom reference genome (normalized fasta file) and selected build index. I am running this now. Next I want to perform featurecounts or htseq which to my knowledge would require the gtf/gff3 file. I am wondering what file I would use here?

1 Like

The annotation from NCBI can be loaded into Galaxy. There is usually a separate annotation file available but you can also extract one from a GenBank file (tool: Genbank to GFF3 converter).

FAQs that may help to organize/format/label the inputs to avoid conflicts: Galaxy Support - Galaxy Community Hub

1 Like

Hello,

Just to follow up, I realize now the gff3 does not work with featurecounts, so this does not work for my problem

For anyone else reading – you can convert reference annotation from any format into any other format with the tool gffread. The FAQs above have the full details.

And example of preparing data from NCBI – both the genome and the annotation – is in this topic. → Inquiry about lettuce genome - #2 by jennaj