RNA STAR alignment with SARS-COV-2 genome annotation - error message

Ermela1 · February 9, 2021, 7:36pm

Hello,
I am trying to align my rna seq data using RNA STAR and the GTF file i downloaded from NCBi. I tried a file from Genecode as well. The links are below

ftp://ftp.ebi.ac.uk/pub/databases/gencode/covid19_trackhub/data/

I keep getting the same error message:

‘’'Fatal INPUT FILE error, no valid exon lines in the GTF file: /data/dnb03/galaxy_db/files/2/f/f/dataset_2ff93edb-4e10-47af-8a34-4a2262379bde.dat
Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file.

Feb 09 20:06:52 … FATAL ERROR, exiting’’’

I tried both gtf files. I made sure they were unziped before uploading. I tried removing the first 4 lines which are comments.
I was using the genome file provided by Galaxy for sars cov-2 but it doesnt have an attached gtf file so i was importing these. I am using the galaxy.eu server
The same RNA seq files I was able to align with RNA STAR to the human genome (hg19) from (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.refGene.gtf.gz) which I found on this support thread (Help for Differential Expression Analysis).

When I look at the begining of the files both the human and the covid file look the same (after i removed the comment lines). The third line in both files is “features”

Any help is appretiated!

Ermela

wm75 · February 9, 2021, 10:10pm

Hi Ermela and welcome here,

a simple explanation for the ebi/gencode link: this file is not providing SARS-CoV-2 annotations, but annotations of Covid-19 (the disease)-related human genes.

For the NCBI annotations: I guess you should take that very first line of the error message literally - there are no exon lines in that file.
This is because SARS-CoV-2 doesn’t really have exons, but subgenomic RNAs and peptides cleaved from precursors, but RNA STAR is not prepared to handle such types of features.

In general, I’m also not sure that STAR is the right aligner for this type of data. It may well make to many splice-machinery specfic assumptions to do a good job here.
There’s also the question which kind of input data you are trying to analyze. Assigning reads to subgenomic RNA species makes most sense for long-read data.
We’ve tried this kind of thing before and actually have a public WF on usegalaxy.eu that uses minimap2 for the job: Galaxy | Europe | Published Workflow | SARS-CoV-2: map ONT reads to transcripts

wm75 · February 9, 2021, 10:14pm

The preprint of the work using this workflow is https://www.biorxiv.org/content/10.1101/2020.07.18.204362v1
and the minimap2 settings should essentially be those used in https://www.cell.com/cell/fulltext/S0092-8674(20)30406-2

Ermela1 · February 10, 2021, 5:13pm

Thanks for the fast response. I think youre right to some degree but I also figured out the chromosome naming convention may have been giving me an error message. I found this excerpt in the STAR manual which may help some people.

2.2.2 Which annotations to use?
The use of the most comprehensive annotations for a given species is strongly recommended. Very
importantly, chromosome names in the annotations GTF file have to match chromosome names in the
FASTA genome sequence files. For example, one can use ENSEMBL FASTA files with ENSEMBL
GTF files, and UCSC FASTA files with UCSC FASTA files. However, since UCSC uses chr1, chr2,
… naming convention, and ENSEMBL uses 1, 2, … naming, the ENSEMBL and UCSC FASTA
and GTF files cannot be mixed together, unless chromosomes are renamed to match between the
FASTA anf GTF files.

I will definitely try minimap2 first i think this may do the trick. For my experiment I have infected 293T cells with COV-2 and sent them for RNA seq. When I got the data back a lot of the reads in the infected cells did not map to the human genome (about 70% vs 5% unmapped reads in the uninfected control) I wanted to map these reads to the COV-2 genome both as a sanity check and maybe to show how infected these cells are.

Topic		Replies	Views
Fatal INPUT FILE error, no valid exon lines in the GTF file usegalaxy.org support	0	1302	March 25, 2021
STAR GTF file error for newbie usegalaxy.org support mapping , transcriptomics , reference-annotation , featurecounts	4	760	April 24, 2023
Error Using human hg38 in Reference-based RNA-Seq data analysis transcriptomics , reference-annotation , reference-genome	2	265	April 29, 2024
Ensembl gene annotation gtf for rat problem with RNA STAR usegalaxy.org support troubleshooting , mapping , reference-annotation , reference-genome , resources	2	40	February 26, 2025
RNA star job is not fully executed and still running usegalaxy.org support rna_star	1	27	August 12, 2024

RNA STAR alignment with SARS-COV-2 genome annotation - error message

Related topics