Error Using human hg38 in Reference-based RNA-Seq data analysis

jcorchero · April 28, 2024, 6:36pm

Hi there!
Thanks for such a great tutorial. However, I tried to follow it up with human data and could not get the RNA Start stage to work.

I had this error message, but I cannot figure it out. Could you please help me on this?
Thanks

Galaxy Tool Error Report

from https://usegalaxy.eu/

Error Localization

Dataset	129983688 (4838ba20a6d8676541b1f0c338a6e6d7)
History	3214167 (5a989024b1c40faf)
Failed Job	150: RNA STAR on data 108, data 81, and data 80: log (4838ba20a6d86765a8d82224b9b0b96a)

User Provided Information

The user redacted (user: 87468) provided the following information:

Fatal error: Matched on FATAL ERROR Fatal INPUT FILE error, no valid exon lines in the GTF file: /data/dnb09/galaxy_db/files/e/2/e/dataset_e2ef103f-6927-4189-844c-ee4c686de0af.dat Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file. Apr 28 13:55:54 … FATAL ERROR, exiting gzip: stdout: Broken pipe gzip: stdout: Broken pipe I am using the RNA STAR tool with this genome version (https://ftp.ensembl.org/pub/release-111/gtf/homo_sapiens/Homo_sapiens.GRCh38.111.chr.gtf.gz) and have that error message. Could you please help me out to fix that error? Thanks

Detailed Job Information

Job environment and execution information is available at the job info page.

Job ID	69115183 (11ac94870d0bb33adaddfbf32a1e9058)
Tool ID	toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.11a+galaxy0
Tool Version	2.7.11a+galaxy0
Job PID or DRM id	49433078
Job Tool Version	None

Job Execution and Failure Information

Command Line

STAR --runThreadN ${GALAXY_SLOTS:-4} --genomeLoad NoSharedMemory --genomeDir ‘/data/db/data_managers/rnastar/2.7.4a/hg38/hg38/dataset_412f3413-7e68-407c-9652-ff4e935abf5a_files’ --sjdbOverhang 100 --sjdbGTFfile ‘/data/dnb09/galaxy_db/files/e/2/e/dataset_e2ef103f-6927-4189-844c-ee4c686de0af.dat’ --sjdbGTFfeatureExon ‘exon’ --readFilesIn ‘/data/dnb10/galaxy_db/files/c/1/0/dataset_c1037ada-c74b-4ee9-8f41-cf3c5f0202a2.dat’ ‘/data/dnb10/galaxy_db/files/0/0/1/dataset_00174e26-d136-484e-a01b-a9ee477236f1.dat’ --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopassMode None ‘’ --quantMode GeneCounts --outSAMattrIHstart 1 --outSAMattributes NH HI AS nM ch --outSAMprimaryFlag OneBestScore --outSAMmapqUnique 60 --outSAMunmapped Within --outBAMsortingThreadN ${GALAXY_SLOTS:-4} --outBAMsortingBinsN 50 --winAnchorMultimapNmax 50 --limitBAMsortRAM $((${GALAXY_MEMORY_MB:-0}*1000000)) --outWigType ‘bedGraph’ ‘’ --outWigStrand ‘Stranded’ --outWigReferencesPrefix ‘-’ --outWigNorm ‘RPM’ && samtools view -b -o ‘/data/jwd02f/main/069/115/69115183/outputs/dataset_19396f4f-26fd-4bb4-a85e-a548087f627d.dat’ Aligned.sortedByCoord.out.bam && mv Signal.Unique.str1.out.bg Signal.Unique.str1.out && mv Signal.UniqueMultiple.str1.out.bg Signal.UniqueMultiple.str1.out && mv Signal.Unique.str2.out.bg Signal.Unique.str2.out && mv Signal.UniqueMultiple.str2.out.bg Signal.UniqueMultiple.str2.out

stderr

Fatal INPUT FILE error, no valid exon lines in the GTF file: /data/dnb09/galaxy_db/files/e/2/e/dataset_e2ef103f-6927-4189-844c-ee4c686de0af.dat Solution: check the formatting of the GTF file. One likely cause is the difference in chromosome naming between GTF and FASTA file. Apr 28 13:55:54 … FATAL ERROR, exiting gzip: stdout: Broken pipe gzip: stdout: Broken pipe

stdout

/usr/local/tools/_conda/envs/mulled-v1-40c069a58b8570974e4581195144b4016c8d8f4255f4cbb822c5896056b567f4/bin/STAR-avx2 --runThreadN 10 --genomeLoad NoSharedMemory --genomeDir /data/db/data_managers/rnastar/2.7.4a/hg38/hg38/dataset_412f3413-7e68-407c-9652-ff4e935abf5a_files --sjdbOverhang 100 --sjdbGTFfile /data/dnb09/galaxy_db/files/e/2/e/dataset_e2ef103f-6927-4189-844c-ee4c686de0af.dat --sjdbGTFfeatureExon exon --readFilesIn /data/dnb10/galaxy_db/files/c/1/0/dataset_c1037ada-c74b-4ee9-8f41-cf3c5f0202a2.dat /data/dnb10/galaxy_db/files/0/0/1/dataset_00174e26-d136-484e-a01b-a9ee477236f1.dat --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopassMode None --quantMode GeneCounts --outSAMattrIHstart 1 --outSAMattributes NH HI AS nM ch --outSAMprimaryFlag OneBestScore --outSAMmapqUnique 60 --outSAMunmapped Within --outBAMsortingThreadN 10 --outBAMsortingBinsN 50 --winAnchorMultimapNmax 50 --limitBAMsortRAM 51200000000 --outWigType bedGraph --outWigStrand Stranded --outWigReferencesPrefix - --outWigNorm RPM STAR version: 2.7.11a compiled: 2023-09-15T02:58:53+0000 :/opt/conda/conda-bld/star_1694746407721/work/source Apr 28 13:49:01 … started STAR run Apr 28 13:49:01 … loading genome Apr 28 13:55:11 … processing annotations GTF

Job Information

None

Job Traceback

None

This is an automated message. Do not reply to this address.

igor · April 28, 2024, 11:36pm

Hi @jcorchero
Check the standard error log file:

There is issue with the annotation file. Either chromosome names are different or it has no exon annotations. Galaxy uses chr1, chr2 etc for for human genome. What do you see in the annotation file? By any chance, is it 1, 2 etc? If yes, get compatible annotation file or modify the chromosome names. Some tools might consider chr1 and Chr1 as different text strings. If the annotation file uses chr1, chr2 for chromosome names, check attributes in the last column. Do use see exon annotation? I assume you used built-in hg38 for mapping.
You can get compatible gene annotations from UCSC Genome Browser or GenCode.
Hope tat helps.
Kind regards,
Igor

jcorchero · April 29, 2024, 5:48pm

Hi Igor,

Thank you very much for your response. You were absolutely right. The file I was using contained a different denomination for chromosomes. I downloaded the correct version from UCSC and it worked nicely. Later, I saw in the tutorial documentation that the files downloaded from Ensembl need further modification to be used with RNA Star, which is exactly what you suggested. Thanks again. Have a good one!

Javier

Topic		Replies	Views
Ensembl gene annotation gtf for rat problem with RNA STAR usegalaxy.org support troubleshooting , mapping , reference-annotation , reference-genome , resources	2	46	February 26, 2025
Fatal INPUT FILE error, no valid exon lines in the GTF file usegalaxy.org support	0	1302	March 25, 2021
UCSC Reference Genome and GTF Fatal Error no valid exons in the GTF file usegalaxy.eu support custom-genome , mapping , fastq-format-error , ucsc , rna-seq , featurecounts , fastq-format , rna_star	3	81	January 22, 2025
RNA STAR alignment with SARS-COV-2 genome annotation - error message sars-cov-2	3	542	February 10, 2021
RNA-star error (contact adminstrator) at UseGalaxy.org -- Resolved usegalaxy.org support transcriptomics , server-side-error , rna_star	12	1085	May 17, 2021