StringTie "no reference transcript". Solutions? Or need new alignments?

jste · April 2, 2020, 6:18pm

What I’m trying to do: DEG with wheat sequences by using HISAT2 → StringTie → Deseq2.

The problem: StringTie failing to use my provided GTF file, giving the warning:

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!
Please make sure the -G annotation file uses the same naming convention for the genome sequences.

I’ve noticed several other threads about this, often involving wheat, but with no clear (to me) follow-up solutions.

What I did before this point:

Upload full fasta sequence of cDNA for 1 chromosome from Ensembl and ran NormalizeFasta as per these instructions. Using the full genome fasta will result in memory failures, and mapping to a single chromosome’s fasta results in failure to set metadata, every time.
Run HiSat2
Upload GFF3 file, also from Ensembl, that was converted to GTF format.
Run StringTie using the HiSat2 BAM file and GTF file.

My thought is the problem is because I have to align to cDNA otherwise the metadata will not bet set. I was hoping StringTie would take the cDNA fasta file gene names from the Hisat2 BAM file, which are the exact same as the GTF gene_id attribute, and make the connection. But that seems wrong now:

Is it possible to change my GTF file to get StringTie to work with it and a BAM file made from a cDNA fasta file? I’ve tried everything to get the alignment with a full chromosome sequence FASTA, but I can’t get it to work without the cDNA file, so I’m at a loss.

Thank you

jste · April 9, 2020, 9:09am

FYI, this is the solution. Several of the tools in galaxy don’t seem to work well with “larger” fasta references, even if a specific error isn’t mentioned. So for wheat I had to use half of the fasta sequence of a full chromosome, which gave me a 250 mB file. With this I could successfully assign my data to this reference after normalization.

jennaj · April 13, 2020, 10:28pm

@jste

Yes, the wheat genome assembly is very large and will fail for memory reasons at public Galaxy servers. This can happen no matter how you run the job or what resources are allocated. Many tools simply cannot handle the chromosome length to create indexes, pre-computed (native genome) or on-demand (custom genome).

If you are willing to use an alternative version of the Triticum genome that has been re-organized by PLAZA, along with a matched GFF annotation, please see: https://bioinformatics.psb.ugent.be/plaza/. Example: https://bioinformatics.psb.ugent.be/plaza/versions/plaza_v4_5_monocots/organism/view/Triticum+aestivum

There are plans to add all PLAZA genomes to usegalaxy.org but that is still a work-in-progress. You might also consider running your own Galaxy and installing the genome there (indexed for tools). Ticket with more details and links if interested: https://github.com/galaxyproject/usegalaxy-playbook/issues/187

Thanks for posting back what worked, and hopefully this extra info provides more options. Cloudman Galaxy is a popular choice for scientists. Galaxy itself is always free but commercial storage/computation resources are usually not. AWS has always offered simple-to-apply-online grants for research/learning purposes, plus they have recently expanded that program.

I added a few more tags to your post in case that interests you. Full resources can be also be found here:

Topic		Replies	Views
Stringetie merge problem usegalaxy.org support	0	612	February 24, 2019
hisat2 and featurecounts usegalaxy.org support gtn-tutorial , workflow , galaxy-local , mapping , transcriptomics , featurecounts	14	1560	January 2, 2023
rice reference genome (FASTA) and annotation genome (GFT)	0	350	May 18, 2020
Can the latest gtf file be used as the annotation file with the old reference genome available in galaxy? usegalaxy.org support troubleshooting , transcriptomics	1	107	February 19, 2024
Non-available Reference genomes in Galaxy (for plants) usegalaxy.org support	4	593	August 6, 2020

StringTie "no reference transcript". Solutions? Or need new alignments?

Related Topics