I am trying to use the featurecounts tool. I have read through the different community support questions of people having similar issues, and I have tried to follow the advices I understood, but none of them have helped. The message says
"ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is ‘gene_id’
An example of attributes included in your GTF annotation is ‘gene_id’.
In the field “GFF feature type filter” i have tried with gene, exon and CDS. And in the field “GFF gene identifier” I have tried with either “gene_id”, “gene_name” “gene_source” and non of those optiones have worked. I have verified in my gtf file and indeed the 9th colum is “gene_id”.
I am using the h38 human genome assembly. I am pretty new to galaxy as well, so i would appreciate explanations for “beginners”.
I have followed the next GalaxyHelp links without success, so anything different will be appreciated link1, link2link3link4
Thank you for your reply. My issue is not similar to the one shared in the thread, as the tool stops before being able to give results at all. See below the error shown:
I figure this could be because I am using a collection? So if because of that I cannot use the built-in option, how can I sort out the issue? I have verified and my gtf file does show “gene_id” in the 9th column, so the GFF gene identifier should be correct, although the error claims otherwise.
Hi Adriana,
featureCounts has a little bug affecting built-in genomes
It should work with collections.
The error log points on issue with the annotation file. Maybe paste one gene annotation in reply and I’ll check it.
Alternatively, you can share the history with me, I can have a look. In History Option (small triangle icon at the top right corner of the history panel) select Share or Publish, in the middle window make history accessible, copy the URL link and paste it into reply. Ideally, it should be a small history. You can copy relevant datasets into a new history and share it.
Sorry for the late reply, thank you for helping out. I tried again with the built-in genome but I do think it has a bug, it does not seem to recognize it. I somehow got lucky last week and it worked for one (which is why i delayed my reply), but if I try to run it again I face the same issue:
I have made a new History that contains only the relevant datasets, hopefully you can have a look at it.
Technical issue → only four genomes are supported with the built-in annotation. It is noted on the tool form, and your screenshot. Your data was not mapped against one of those, so the option is not available.
Scientific issue → Transcriptome analysis will be much more stable when using hg38. Try mapping against that version of the human genome instead. Review my guide, and try an internet search, to understand why.
Thank you for your reply, however in the Bowtie2 mapping I am doing it with hg38 assembly (as well as one in a different history not linked here), and in either of those I am able to select the hg38 built-in genome option. The same image I shared above appears of “! Please provide a value for this option. Select built-in genome. No options available”.
So I have that problem when mapping against hg38 with Bowtie2 as seen in the linked history or with those (Bowtie2, HISAT2, RNA STAR) not shared in the link.
Also, as I described initially and explained to Igor above, even when using hg38 and HISAT2 the fatal error after running featurecounts with a GTF from my history appears. Which is why I tried to use the built in genome option.
So, when I use the GTF from my history against hg38, featurecounts shows a fatal error (describe earlier) and when I try to use the built-in genome against hg38 the option is not available.
I appreciate your reply but unfortunately it does not solve my issue.
Hi Adriana,
thank you for the shared history. Collection #9 (bowtie2) has PE reads mapped to hg38. I used featureCounts with built-in gene models on this collection with no issue. As Jennifer pointed out, other collections use ref genomes not compatible with built-in models.
The procedure:
find featureCounts in history panel, click at it
change Gene annotation file to featureCounts built-in
change type of input to collection and select collection #9
change Does the input have read pairs to Yes, paired end and count them as 1 single fragment.
hit Run tool
History with completed featureCounts job
Thank you. I tried it in the same order you describe and that seems to fix it, though strange that selecting the collection first and then the gene annotation file makes it “crash”. Anyway, thank you for your advice and time to help sort this out! Turns out it was easier than I thought. I do remain with the Fatal error issue when trying to use my own annotation, but at least for now using hg38 will not be a problem as I can now continue with the built-in one.
Also thank you Jennifer for your input, it is good to know now the compatibility with built-in models. I suppose this is the kind of things we beginners overlook.
Hi Adriana,
I am sorry I forgot about the annotation file issue! Do you mean dataset #55 in the shared history? It does not look right. Click on name of the dataset. The brief description says: 1 line, 2,905,058 comments, while expected numbers should be something like 2,905,054 lines, 5 comments. It seems the file was poorly processed (during upload to Galaxy?), and all the content was ended up in 1st column as space separated values. Basically, it is not GTF file because of incorrect (missing) tab separators withing the text. The number of columns is very high for GTF/GFF. I don’t remember if I seen anything like this in Galaxy. I cannot think about an easy fix for it. I guess it is doable, but I rather get a proper annotation.
Maybe try GenCode annotation. Make sure you use annotation with chromosome name chr1, chr2 etc, not 1, 2 etc.
How did you uploaded the annotation file? If possible, try upload by URL, and avoid file editing on local machine if it runs under Windows.