Error when attempting stringtie assembly: no reference transcripts found. Using WBcel235 from NCBI

Hello, I’m trying to run stringtie as precursor to analysis with dexseq2. For my annotation file, I’m using WBcel235 downloaded from NCBI.

When I run stringtie, I get the following error:

" An error occurred with this dataset: format gtf database ce11 WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences."

I get the gist of this - Galaxy somehow doesn’t like the format of my annotation file. But I am not sure entirely what is wrong with it, or how to fix this. I’ve copied the first few lines of my annotation file below.

Any help would be greatly appreciated.

#gtf-version 2.2
#!genome-build WBcel235
#!genome-build-accession NCBI_Assembly:GCF_000002985.6
#!annotation-source WormBase WS283
NC_003279.8 RefSeq gene 3747 3909 . - . gene_id “CELE_Y74C9A.6”; transcript_id “”; db_xref “GeneID:353377”; gbkey “Gene”; gene “Y74C9A.6”; gene_biotype “snoRNA”; locus_tag “CELE_Y74C9A.6”;
NC_003279.8 RefSeq transcript 3747 3909 . - . gene_id “CELE_Y74C9A.6”; transcript_id “NR_001477.2”; db_x

Hi @mmchamp

The first thing I notice is header lines in the GTF. Many tools will complain if those are not removed.

Next, check that the sequence IDs between that reference annotation GTF and the reference genome used for mapping all match (exactly).

  • Galaxy indexed genomes are usually sourced from UCSC and they also host matching reference annotation. Example: Index of /goldenPath/ce11/bigZips/genes.
  • If you used a custom genome (fasta) instead, use an annotation that pairs with that version of the assembly.