ref_gene_id featurecounts

I am facing another problem with feature counts. I aligned my paired sequence reads to mm10 genome through HISAT2. I then ran these through stringtie. I then merged all the stringtie files together and called transcriptome. I then merged the transcriptome with a mm10 refseq GTF through StringTie Merge. I am now at the stage of pairing each HISAT2 file towared the merged transcriptome. I dont want to use gene_ID for this as it only gives me code for MSTRG, when in fact I need the actual gene names. So I decided to use ref_gene_id as the identifier. However, everytime I do this it comes up with an error saying:

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is ‘ref_gene_id’

However, from the images enclosed it can be clearly shown ref_gene_id is mentioned in the 9th column alongside gene_id and transcript_id. I’m really stuck and dont know what to do from this. I have previously used ref_gene_id on another RNA-seq experiment and it worked fine.


1 Like

I decided to redo my original stringtie files and use mm10_RefSeq.gtf as a reference file and selected the option of use reference transcripts only to obtain the gene name ID’s rather than MSTRG novel ones.

1 Like

The problem with the original annotation is (probably) because the identifier ref_gene_id is not included in every record.

The changes made with mapping will probably address that as an issue (won’t map to the “genes” without the extra annotation). But you could also filter that GTF (remove any lines that you don’t want to be counted up). I can’t see the “type” for each line, but Feaurecounts does use it, and it can be specified on the form.

Whatever GTF you decide to use, there are tools in Galaxy to map various gene/transcript identifiers to others, including from a custom mapping.

  • UniProt ID mapping and retrieval
  • AnnotateMyIDs
  • Replace column by values which are defined in a convert file

I will try those methods. Thank you for the advice :slight_smile:

I followed you advice and used filter gff by attribute function and typed ref_gene_id. I then used this in my featurecounts file and it worked :smiley: :smiley: :smiley: . Thank you so much, like you have really helped me out :slight_smile:

1 Like

Hi Joe_Hoolachan,
I am so happy to find the sam problem had been solved, but I am wondering how did you filter GTF lines? Could you explain the solution in detail? I will appreciate your response, sincerely.

Another solution I found for ref_gene_id is to use the last version of featurecounts rather than the current as that works for acknowleding the ref_gene_id. Are you also using the HISAT2, StringTie package? as Stringtie has been quite annoying with the MSTRG codes