MSTRG to Gene Name Conversion


We are trying to analyze our files following this RNA Seq tutorial:

Our DESeq2 outputs, however, are giving us MSTRG gene IDs, which are not very useful as the labels are only relevant internally. In researching how to convert MSTRG to a gene name, we found that a better “reference file to guide assembly” should be used to run StringTie so a gene name is outputted. However, the galaxy StringTie job fails when we use the Homo_sapiens_NCBI_GrCh38.tar.gz reference file. We have tried unzipping it to .tar and also using a DEXSeq annotation as the reference file, then ran StringTie, but this also failed. Also, the option to use a built in reference file is disabled. Currently, the only method that successfully outputs a DESeq2 file is when no reference file is used, however we are unsure how to convert the MSTRG gene ID it outputs to a gene name/prevent MSTRG outputs.

Any help is greatly appreciated; thanks in advance!
-Preformatted textAnanya

1 Like

Hi Ananya_P,

Were you able to resolve this issue? What worked for you? We are having the same issue currently.


1 Like

Hi Johnathon.

My apologies for the late reply. Unfortunately, we have not been able to resolve this issue. Please let me know if your team has found any helpful solutions.


1 Like

Hello @Ananya_P @Johnathon_Anderson

Sorry the original question got missed.

This FAQ explains how/where to get a human reference annotation dataset that will work with these tools:

I also added some tags to your post that includes other Q&A about this. In short, you need a GTF dataset that matches the UCSC version of the genome/build (if you are mapping against the built-in genome indexes). Correct format matters or tools will fail. The FAQ and linked FAQs explain with full details and common sources.

GRCh38 is the same genome as UCSC’s hg38.
GRCh37 is the same genome as UCSC’s hg19.

But the “build” may differ between sources. Check your chromosome identifiers and make sure they are a match.