Difference Between using mm10 vs GRCm38 GFF/GTF

Hi @macmade

Both mm10 and GRCm38 refer to the same reference genome assembly. These can be labeled differently (chromosome names).

Reference annotation based on that genome assembly can also differ by who created the annotation – both in content and in chromosome labeling.

There are about 30k protein-coding genes for mouse. The reference annotation built-in for featureCounts represents those genes (only – based on Entrez). Other reference annotation sources may contain other genomic features, including transcripts associated with genes.

Check the Ebsembl IDs to learn if they represent genes, transcripts, and/or other features. I’m guessing transcripts from the count, but you should confirm that.

Whether you want counts by transcript or gene depends on what your analysis goals are. Your description of the tool that “merges several genes together” might be actually merging transcripts into genes… but that is another guess.

All inputs to analysis should be based on the same reference genome assembly AND build (matching chromosome identifiers). The assemblies are already the same. You’ll need to check if chromosome identifiers are a match.

You can use built-in annotation with Featurecounts. And can use other annotation with Featurecounts or HT-seq count.

See these tutorials for more help:

I also added a few tags to your post that point to prior Q&A that cover common annotation sources, ways to convert IDs, plus methods to address format and content.

Thanks!