I had been using featurecounts pretty successfully until today. Today I noticed that, for a few of the datasets I’m analyzing, there are a large minority of reads categorized as Unassigned_NoFeatures (for example, 1904548 Unassigned_NoFeatures vs. 8432736 Assigned). I am filtering on “gene” and identifying “locus_tag.” I looked at my gff, and I noticed there are multiple loci for which the Type (column 3) is something other than “gene” (e.g., “pseudogene”).
To see if I could get more of those unassigned reads to be assigned, I opened the gff in Excel, changed the Type (column 3) for all the relevant loci to “gene,” saved as a tab-delimited .txt file, and changed the extension to.gff. This is literally all I did to change the file. I tried running featurecounts again using all the same settings as before (other than using the new gff, obviously). Now I’m getting a fatal error:
ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is ‘locus_tag’
An example of attributes included in your GTF annotation is ‘“ID=gene-MXAN_RS38475;Dbxref=GeneID:70676530;Name=MXAN_RS38475;gbkey=Gene;gene_biotype=pseudogene;locus_tag=MXAN_RS38475;partial=true;pseudo=true;start_range=.,77828”’.
You can clearly see “locus_tag” in the example provided in the error, and I looked back at the gff and confirmed the attributes are in column 9. Any idea why featurecounts suddenly doesn’t recognize locus_tag as being in the attributes, when it worked before? (Also, if you have any other ideas about why I’m getting such high numbers of unassigned_nofeatures, I’m all ears!) Thanks in advance for any advice!