Ensembl gene annotation gtf for rat problem with RNA STAR

Hi
I am trying to map using RNA star. I used the built-in rat genome rn6 and I downloaded the annotation gtf from ensembl and when running STAR I got an error:“no valid exon lines in the GTF file.” I removed the header lines.
I don’t know if I am using the wrong gtf.
I also tried finding the annotation file from UCSC Main Tool but I got a bunch of IDs that were not recognized by annotateMyID, so I might have used the wrong annotation table. I don’t know which one is the correct one.
I’d appreciate some pointers. Thanks

Hi @ed_bahnson

Getting the reference data right can be a bit complicated!

We have a guide here that covers the current human assemblies, and most of this would apply for rat assemblies, too.

And this guide outlines how to organize the data for RNA-seq analysis, and most other analysis paths really (might be overkill but would never be a bad idea).

I’m guessing that there was either a formatting problem or a chromosome identifier “mismatch” type of problem or both with RNA Star.

You should also be getting the UCSC annotation from the downloads area instead of the Table Browser. Notice the warning when GTF output is selected there – this will impact most things you will do in Galaxy due to how the gene_id attribute is reported. The downloads area will have that correct.

For the AnnotateMyIDs tool, there is likely something else going on, probably mismatched sources for the gene naming scheme used.

In short, review your identifiers. The guide above explains how to find these in different kinds of data: BAMs and others.

Hopefully this helps to solve this, but if you need more help, we’ll need to know the exact genome choices/parameters used per tool. Then, what those input files look like. You can share the history or screenshots as long as all the details are included.

Hope this helps! :slight_smile:

Thank you very much!

1 Like