RNA Star and mouse Ensembl GRCm39 problem

ptrivedi · December 5, 2022, 11:10am

I have tried to run RNA STAR with Mus_musculus.GRCm39.108.gtf.gz as input for “Gene model (gff3,gtf) file for splice junctions” and it gives error.

Interestingly I can run RNA STAR with gencode.vM10.annotation.gtf.gz as input for “Gene model (gff3,gtf) file for splice junctions” successfully. However, tools like Annotate ID gives an error and Gene IDs can not be converted to Gene Symbols.

Please can someone help me with this. Thanks.

igor · December 6, 2022, 12:16am

@ptrivedi
what reference genome do you use for mapping? Gencode.vM10 is for GRCm38 - see GENCODE - Mouse Release M10, while Mus_musculus.GRCm39.108.gtf.gz is for GRCm39 assembly.
Precomputed mm39 index is not available on usegalaxy.org or usegalaxy.org.au. Do you use Galaxy Europe?
STAR jobs can fail for several reasons, and without looking into the input files and job settings I can only guess. Other common issues are different chromosome names in reference genome and annotation. Genomes in Galaxy use “UCSC style” chromosome names (chr1, chr2 etc), while some annotations use 1, 2, etc. These are different text strings. If you use a custom mm39 genome in the history, STAR jobs might fail because of insufficient memory.

I assume you mean AnnotateMyID. I tested it on a count table made using an old gencode annotation vM4. The annotation is for mm10/GRCm38. In the count table ENSEMBL genes names contain versions (.1, .2 etc), while example of input file in the tool description does not have versions. I stripped versions in the count table by replacing dots with tabular character, basically, split the 1st column into two. This makes the gene names compatible with AnnotateMyIDs. No such issue if you use non-default settings, such as gene_name for gene identifier, just change input type in AnnotateMyID to Symbol.

Kind regards,
Igor

ptrivedi · December 10, 2022, 1:45am

@igor

Thanks very much for your response. Sorry for the delay in my reply.

I am new to galaxy and Bioinformatics. Now it completely makes sense that why Mus_musculus.GRCm39.108.gtf.gz failed to run on RNA Star. I am using usegalaxy.org.au and as you have suggested it does not have precomputed mm39.

I am attaching screenshot of my RNA Star input just for your reference.

Thanks for suggesting solution for AnnotateMyIDs. I will get back to you once i try that.

regards,
Prerak

igor · December 11, 2022, 11:15pm

Hi Pretak,
maybe have a look at End to end analysis tutorials at Galaxy Training!
The upload part is somewhat complicated, but tutorial can be done on datasets instead of collection. Unless you aim on single cell RNA-Seq, I recommend HiSAT2 and featureCounts.
Kind regards,
Igor

Topic		Replies	Views
Ensembl gene annotation gtf for rat problem with RNA STAR usegalaxy.org support troubleshooting , mapping , reference-annotation , reference-genome , resources	2	35	February 26, 2025
STAR GTF file error for newbie usegalaxy.org support mapping , transcriptomics , reference-annotation , featurecounts	4	749	April 24, 2023
RNA STAR not running! transcriptomics , stringtie	3	652	March 22, 2022
Can the latest gtf file be used as the annotation file with the old reference genome available in galaxy? usegalaxy.org support troubleshooting , transcriptomics	1	210	February 19, 2024
Error Using human hg38 in Reference-based RNA-Seq data analysis transcriptomics , reference-annotation , reference-genome	2	247	April 29, 2024

RNA Star and mouse Ensembl GRCm39 problem

Related topics