GTF for mm10 GRCm38 (Mouse Dec 11) usegalaxy.org

Hello,
I did RNASeq using HISAT2 and used the available index for mm10 (Mouse Dec 2011 (GRCm38/mm10). Post mapping and fetching raw counts with htseq counts I obtained some gene ids that could not be mapped to NCBI refseq or using Genecode ( M18 (GRCm38.p6)) track as suggested by this link here HiSAT2 alignment to GRCm38? in Galaxy -- where to find mm10 reference annotation . I could not map genes eg., Zfp71-rs1, Sgol2. Could anybody help me with fetching the gtf , and help me out with these geneids.
Here is the code I used

library(dplyr)
library(rtracklayer)

mm10refseqgene <- rtracklayer::import("D:/work/annotations/mm10.ncbiRefSeq.gtf")
mm10refseqgenedf <- as.data.frame(mm10refseqgene)
mm10refseqgenedf$GeneID  <- mm10refseqgenedf$gene_name
deseqP60 <- read.csv("D:/work/RNASeqdata-Tcf7l2/deseq2_P60_Tcf7l2KO-reflevel_wildtype.csv")
colnames(deseqP60)[1] <- "GeneID"
mm10refseq_P60Tcf7l2deseq2 <- left_join( deseqP60,mm10refseqgenedf, by = "GeneID")
write.csv(mm10refseq_P60Tcf7l2deseq2, "D:/work/RNASeqdata-Tcf7l2/deseq2_P60_Tcf7l2KO-reflevel_wildtype-refseqannotations.csv")
m18 <- rtracklayer::import("D:/work/annotations/gencode.vM18.annotation.gtf")
m18df <- as.data.frame(m18)
m18df$GeneID <- m18df$gene_name
m18_genecode_deseqP60 <- left_join(deseqP60, m18df, by = "GeneID")
write.csv(m18_genecode_deseqP60, "D:/work/RNASeqdata-Tcf7l2/deseq2_P60_Tcf7l2KO-reflevel_wildtype-GenecodeM18annotations.csv")

Thanks in advance,
Chaitali

1 Like

Hi @Chaitali

The gene IDs that are not mapping are not native to mouse.

To learn more about the cross-species annotation, and which annotation tracks have more information about these genes, try a query against the mm10 genome at http://genome.ucsc.edu/.

An example in the UCSC mm10 genome browser with a gene you referenced (Sgol2) offers these annotation choices:

Your search resulted in multiple matches. Please select a position:

Gencode Genes
Sgo2a (ENSMUST00000027202.8) at chr1:57995971-58025899 - Mus musculus shugoshin 2A (Sgo2a), transcript variant 1, mRNA. (from RefSeq NM_199007)
Ppp2ca (ENSMUST00000020608.2) at chr11:52098681-52127778 - Mus musculus protein phosphatase 2 (formerly 2A), catalytic subunit, alpha isoform (Ppp2ca), mRNA. (from RefSeq NM_019411)
Knstrn (ENSMUST00000134661.7) at chr2:118814003-118837696 - Mus musculus kinetochore-localized astrin/SPAG5 binding (Knstrn), mRNA. (from RefSeq NM_026412)
Spag5 (ENSMUST00000045026.3) at chr11:78301529-78322457 - Mus musculus sperm associated antigen 5 (Spag5), transcript variant 1, mRNA. (from RefSeq NM_017407)
Kif2c (ENSMUST00000065896.8) at chr4:117159639-117182639 - Mus musculus kinesin family member 2C (Kif2c), transcript variant 1, mRNA. (from RefSeq NM_134471)
Dynll1 (ENSMUST00000009157.3) at chr5:115297110-115300999 - Mus musculus dynein light chain LC8-type 1 (Dynll1), mRNA. (from RefSeq NM_019682)
Dynll1 (ENSMUST00000112090.1) at chr5:115298505-115300912 - Acts as one of several non-catalytic accessory  components of the cytoplasmic dynein 1 complex that are thought to  be involved in linking dynein to cargos and to adapter proteins  that regulate dynein function. Cytoplasmic dynein 1 acts as a  motor for the intracellular retrograde motility of vesicles and  organelles along microtubules. May play a role in changing or  maintaining the spatial distribution of cytoskeletal structures  (By similarity). (from UniProt P63168)
NCBI RefSeq genes, curated subset (NM_*, NR_*, NP_* or YP_*)
NM_001177867.1 at chr1:57985340-58025897
NM_199007.2 at chr1:57995974-58026277
NM_001195687.1 at chr8:63924694-63952170
Mouse Aligned mRNA Search Results
BC023855 - Mus musculus shugoshin-like 2 (S. pombe), mRNA (cDNA clone IMAGE:5346771), partial cds.
BC044797 - Mus musculus shugoshin-like 2 (S. pombe), mRNA (cDNA clone IMAGE:4953838), partial cds.
BC052742 - Mus musculus shugoshin-like 2 (S. pombe), mRNA (cDNA clone MGC:63378 IMAGE:6833875), complete cds.
Non-Mouse Aligned mRNA Search Results
KJ903881 - Synthetic construct Homo sapiens clone ccsbBroadEn_13275 SGOL2 gene, encodes complete protein.
BC035764 - Homo sapiens shugoshin-like 2 (S. pombe), mRNA (cDNA clone IMAGE:5551931), partial cds.
BC048349 - Homo sapiens shugoshin-like 2 (S. pombe), mRNA (cDNA clone IMAGE:5555637), complete cds.
BC092412 - Homo sapiens shugoshin-like 2 (S. pombe), mRNA (cDNA clone MGC:102910 IMAGE:30383194), complete cds.
AB527364 - Synthetic construct DNA, clone: pF1KE0311, Homo sapiens SGOL2 gene for shugoshin-like protein 2, without stop codon, in Flexi system.
AM392582 - Synthetic construct Homo sapiens clone IMAGE:100002290 for hypothetical protein (SGOL2 gene).
AM392984 - Synthetic construct Homo sapiens clone IMAGE:100002291 for hypothetical protein (SGOL2 gene).
CU690576 - Synthetic construct Homo sapiens gateway clone IMAGE:100022748 5' read SGOL2 mRNA.
CU690577 - Synthetic construct Homo sapiens gateway clone IMAGE:100022748 3' read SGOL2 mRNA.
JU474441 - TSA: Macaca mulatta Mamu_377541 mRNA sequence.

Hope that helps!