I have the complete genome sequence (fasta format) from NCBI for my reference genome, but no access to any GTF/GFF3 file (it is a phage identified in our lab). I normalized the fasta file using NormalizeFasta. My understanding is that this is sufficient for the bwa-mem step, but I am a little unsure how to create my own gtf/gff3 file for downstream counting step (planning on using featurecounts or htseq). Any advice would be appreciated.
Alexa
Sorry I think I was unclear. The genome in NCBI has been annotated (there is an associated GenBank file). Let me be more clear: I downloaded just the fasta file of the genome. For bwa-mem, I input the custom reference genome (normalized fasta file) and selected build index. I am running this now. Next I want to perform featurecounts or htseq which to my knowledge would require the gtf/gff3 file. I am wondering what file I would use here?
The annotation from NCBI can be loaded into Galaxy. There is usually a separate annotation file available but you can also extract one from a GenBank file (tool: Genbank to GFF3 converter).
For anyone else reading – you can convert reference annotation from any format into any other format with the tool gffread. The FAQs above have the full details.