I’ve mapped RNA-seq data with Bowtie2 to the GRCm38 genome and I am trying to count using htseq.
I have imported GTF files from bot UCSC and Ensembl but I cannot get the right annotation. It works for Ensembl transcripts IDs but when I select gene_id all counts are zero.
I’ve tried a GTF file from GENCODE and worked just fine.
As before I used “Feature type = gene” and “ID Attribute = gene_id”… I am still puzzled why I don’t gent gene counts with other GTF.
There was probably a mismatch for the “Feature type = gene” when using the UCSC GTF (not a good choice anyway, as both gene_id and transcript_id are the same value from this data source - effectively meaning all counts will be “by transcript” even if reportedly “by gene”).
And there was probably a chromosome mismatch problem when using the Ensembl GTF. Ensembl chromosome names will have a format like “1” when most of the genome pre-loaded indexes for mapping tool use the UCSC chromosome names with a format like “chr1”. See Mismatched Chromosome identifiers (and how to avoid them)
Gencode andiGenomes are the best sources for reference annotation when your genome/build version is supported by either. Gencode GTFs can be loaded directly by URL. iGenomes archives need to be downloaded and unpacked locally first, then just the genes.gtf file uploaded to Galaxy.