RNA-STAR and hg38 GTF reference annotation

reference-annotation
gtf
hg38
rna-star

#1

Hi, I am getting this error when trying to run RNA-STAR, how can I resolve it? Thank you.

Fatal error: Matched on FATAL ERROR

Transcriptome.cpp:48:Transcriptome: exiting because of INPUT FILE error: could not open input file /cvmfs/data.galaxyproject.org/managed/rnastar_index2/hg38/dataset_950901_files/exonGeTrInfo.tab
Solution: check that the file exists and you have read permission for this file
SOLUTION: utilize --sjdbGTFfile /path/to/annotantions.gtf option at the genome generation step or mapping step

Mar 06 00:51:59 … FATAL ERROR, exiting


#2

Hello,

You need to supply a reference annotation GFT dataset from the history at runtime.

The GTF should be based on the UCSC “hg38” genome build. Some choices:

  • For Gencode, copy the link to the GTF and paste it into the Upload tool. Hg38 data is here https://www.gencodegenes.org/. After it is loaded, remove the headers (lines that start with a “#”) with the Select tool using the options “NOT Matching” with the regular expression ^# . Once the formatting is fixed, change the datatype to be gft under Edit Attributes (pencil icon). The data will be given the datatype gff by default, which works fine with some tools and but not with others. Avoid the gff3 version of this particular data (contains duplicated IDs and several RNA-seq tools do not work with annotation in that format anyway).
  • For iGenomes, the archive corresponding to the target genome/build needs to be locally downloaded, the tar archive unpacked, and then just the genes.gtf data uploaded to Galaxy (browse the local file, or use FTP). Find all available genome/builds here: https://support.illumina.com/sequencing/sequencing_software/igenome.html

What exactly are the "built-in references" in Galaxy's HISAT2?
No Reference GFF file available in ClosestBed tool
How can I improve very low assigned rate in featureCounts?
#3

I did, but it was in gtf.gz format. I will reupload in gtf format in case this was why the annotation file could not be accessed.

Thanks.


#4

That should work. The datatype gtf.gz is not supported.

I wondering how this was loaded – gtf data in compressed format will uncompress upon Upload when “auto-detect” is used (for “type”). And, gtf.gz cannot be assigned directly.

I’d be interested in taking a look at that dataset (even if deleted). What is the history name/dataset number? You don’t need to share the actual history link/content here.