Featurecounts error using a gene annotation from a gff3 file

The annotation I am trying to use is the Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3.gz file linked here: Index of /Zm-B73-REFERENCE-NAM-5.0. I suspect this gff3 file is the source of the problems and needs to be altered somehow to work with featurecounts

I have paired-end RNA-seq reads from maize that I am trying to convert to counts using featurecounts. I basically followed the steps outlined in this tutorial, but with a few differences (e.g. my RNA library is forward-stranded and has longer reads): Hands-on: Hands-on: Reference-based RNA-Seq data analysis / Transcriptomics

The creation of a collection of 2 paired ends, quality control, and initial mapping with RNA-star seemed to work, but then when I tried to do featurecounts and got an error: “failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is ‘gene_id’. An example of attributes included in your GTF annotation is ‘Parent=Zm00001eb000010_T001;Name=Zm00001eb000010_T001.exon.1;ensembl_end_phase=0;ensembl_phase=0;exon_id=Zm00001eb000010_T001.exon.1;rank=1’.

Then I looked into my file and saw that it was using “gene_id” as an identifier, so I tried GFF gene identifier: gene_id, which is used in the 9th column of my file; however, I still got “failed to find the gene identifier attribute in the 9th column of the provided GTF file” error.

Then I tried leaving the “GFF feature type filter” and “GFF gene identifier” fields empty, but I got an error saying “no features were loaded in format GTF. The annotation format can be specified by the ‘-F’ option, and the required feature type can be specified by the ‘-t’ option

What I tried next: I learned that featurecounts requires a genome to be specified, so I created a custom genome by uploading the fasta file Zm-B73-REFERENCE-NAM-5.0.fa.gzfrom the page linked above. I ran NormalizeFasta as suggested to create a uniform line length of 80 and created a custom genome, maize_v5. I went back in my history and tagged the gff3 file, the reads, and the RNA Star outputs as being associated with maize_v5 and ran it again, but the same errors occurred.

What should I do next to troubleshoot this problem?
Thanks!

Hi @Claire_Milsted

Some tools only accept GTF. Some also accept GFF3 and those will have options on the tool forms where you can change the attributes to use when counting up reads versus features in the annotation.

Featurecounts happens to accept GFF3. So you need to do some adjustments.

Converting to GTF is also an option. See Working with GFF GFT GTF2 GFF3 reference annotation

This forum also has a LOT of troubleshooting about reference annotation. Search with tool names or datatypes. I also added some tags.

For the reference genome, see Reference genomes

1 Like

Upon closer inspection I think the problem might be with the column names in the gff3 file. In the drosophila gtf file, which worked, the column containing feature types is called " Feature", but in the maize gff3 file, which didn’t work, the column is called “Type.” I don’t know how to fix this.

When I try to convert to GTF using gffread, I get an error saying “overlapping duplicate BED feature (ID=mRNA)

Update: It seems to have worked! I was able to get some kind of reasonable output by just setting GFF feature type filter to “gene” and GFF gene identifier to “ID.” It didn’t work before when I used “exon” and “ID,” because the exon columns don’t have an “ID” assigned to them, only a “Parent.” Since what I’m interested in is just the gene counts, I’m happy with this result. Thanks for your hint to focus very carefully on the feature type and gene identifier terms.

1 Like