STAR GTF file error for newbie

EriKoy · April 19, 2023, 1:10am

Hello! I am an undergrad and am in desperate need of any assistance, as my mentor is not familiar with bioinformatics!

I am using Galaxy to do differential gene expression RNA sequencing and am following Galaxy’s tutorial (Reference-based RNA-Seq data analysis). I have 12 pairs of pair-end samples in FASTQ format of HeLa cells. I did QC and did not do cutadapt as one online source stated that it was not necessary if I would be using STAR to map my sequences. (Do you think that this was a smart decision?)

Right now, I am trying to use STAR to map my sequences to the Homo_sapiens.GRCh38.109.gtf file, but I keep getting the following error: “Fatal INPUT FILE error, no valid exon lines in the GTF file: /jetstream2/scratch/main/jobs/49599174/inputs/dataset_b28934b5-2b19-” I used a file from Ensemble and retried with one from UCSC, but both gave the same error. What should I do for my next step?

Thank you so much in advance for your help! I really appreciate it!!

igor · April 19, 2023, 5:13am

Hi @EriKoy
popular aligners use soft clipping (ignore unmappable nucleotides at reads’ start and end). You’ll see it in CIGAR string, eg 12S50M means 12 soft clipped nucleotides at alignment start followed by 50 matches. For additional information check specification of SAM format. You can check role of adapters and compare counts from original and trimmed reads. What important: treat all samples in the same way.

We do see elevated rate of errors when RNA_STAR is used with gene annotations. Gapped aligners can map reads across splice sites without gene annotations.

Personally, I prefer two step approach, mapping and reads counting. Some tools have built-in gene models for popular organisms 1: RNA-Seq reads to counts
Usually I import annotations, so I know what is used.
RNA_STAR provides very limited control over read counting, while featureCounts and htseq-count allow selection of attributes and features.
It is hard to say why the job failed without checking the annotation file. The annotation and the reference genome should be for the same version of genome assembly, with identical chromosome/contig names, for example, chr1, Chr1 and 1 might be considered as three different text strings/names.
Maybe try HiSAT2/featureCounts approach described in the tutorial above on one sample to see if it works for you.
Hope that helps.
Kind regards,
Igor

EriKoy · April 23, 2023, 11:29pm

Thank you so much for your help @igor! I was able to successfully map my reads using HiSAT2/featureCounts!!

I have a quick question about my MDS plot from imma. I have two replicates of each data, but the plot shows the samples to be separated from each other. Do you think that this is something that I should be concerned about?

igor · April 23, 2023, 11:52pm

Hi @EriKoy
MDS plots are discussed in depth in 2: RNA-seq counts to genes

Hope that helps.

Kind regards,

Igor

EriKoy · April 24, 2023, 5:08pm

Dear @igor,

Thank you so much for your advice! I was able to make a volcano plot and successfully finished my senior thesis because of your help!

Thanks again, and please take care,

Erica

Topic		Replies	Views
Fatal INPUT FILE error, no valid exon lines in the GTF file usegalaxy.org support	0	1298	March 25, 2021
Ensembl gene annotation gtf for rat problem with RNA STAR usegalaxy.org support troubleshooting , mapping , reference-annotation , reference-genome , resources	2	35	February 26, 2025
Error Using human hg38 in Reference-based RNA-Seq data analysis transcriptomics , reference-annotation , reference-genome	2	245	April 29, 2024
UCSC Reference Genome and GTF Fatal Error no valid exons in the GTF file usegalaxy.eu support custom-genome , mapping , fastq-format-error , ucsc , rna-seq , featurecounts , fastq-format , rna_star	3	66	January 22, 2025
RNA STAR alignment with SARS-COV-2 genome annotation - error message sars-cov-2	3	535	February 10, 2021

STAR GTF file error for newbie

Related topics