I ran FeatureCounts using the outputs of RNA STAR with gtf of DmelGCF. This data is paired-end and I let it count them as 1 single fragment. In the result, lots of reads were assigned to the annotation. However, the problem is that the output of FeatureCounts lacks some Gene IDs which exist in RNA Star output file. I assume it’s not just no reads were assigned to the Gene IDs because I still should be able to see the “0 counts” in the Gene ID in this case.
Any feedback would be welcome. Please let me know if this is unclear.
Thanks so much.
Thanks it helps. But still don’t know why FeatureCount ignores some gene_id.
I looked at BAM file in IGV. One thing I noticed is that some genes appear as common names but not gene_id in IGV. For example, CG9629, also known as Aldh7AI can be searched by Aldh7AI but not by CG9629. So I checked the output of FeatureCount and searched by using the name of Aldh7AI but this gene was absent anyways. I also checked in GTF and confirmed every gene has the gene_id attributes (9th column).
Those gene names come from the reference annotation GTF file. Maybe use a different source for that annotation?
And sometimes you can convert IDs after this tool. One tool choice: annotateMyIDs. There are also several “replace” tools. Search the tool panel to find them, and this tutorial has some examples: Data Manipulation Olympics
Keep in mind that the output of Featurecounts is reporting the features that your data happens to have an overlapping match with against the base reference genome. The tool is comparing the coordinates of features in the GTF against the mapping coordinates of the reads, then doing some manipulations to decide what to count up or not, and outputting the result.
So … changing any of those parts can change the result. You can try different annotation sources, different mapping or counting tools, different parameters to see what happens. If the gene you are interested in is not showing up in the result, there will be a reason why but you’ll need to investigate.