Hi I’m new to Galaxy and trying to to perform DEG from RNA-seq for 3 conditions with 3 reps (9 total samples) but have fallen off track by getting empty columns for gene counts at the Stringtie step. The workflow has been FASTQC, Trimmomatic, HISAT2, Stringtie, Stringtie merge, Stringtie (using Stringtie merge as a reference genome) >> DESeq2. I search topics and found a very similar issue that was solved at Gene count file was empty after Stringtie but I could not find the posted solution. Any suggestions to point me in the right direction would be appreciated.
Yes, the other reply was handled off line, so we don’t have the details of the resolution. But you can do the same – share your history and we can take a look and hopefully help.
As a guess – the problems with this pipeline usually involve some mismatch between the reference genome and reference annotation, or some problem with the reference annotation formatting. I added a few tags that might help but we can troubleshoot directly, too. Your choice.
Any persistent problems can be reported in a new question for community help. Be sure to provide enough context so others can review the situation exactly and quickly offer advice.
Thanks for the quick response.
Here is a link to the history of my latest run.
The information contained in this e-mail may be privileged, confidential, and/or protected from disclosure. If you are the intended recipient, further disclosures are prohibited without proper authorization. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly prohibited and no waiver of any attorney-client, work product, or other privilege is intended. No binding agreement on behalf of Baylor Scott & White Health, or any affiliated entity, is permitted by e-mail without express written confirmation by a duly authorized representative of Baylor Scott & White Health.
The first run through Stringtie was done correctly, the but second run has the problem.
This priorr Q&A (and what is linked from it) has the details. Stringtie merge error - #4 by jennaj
Please give that a review, and try again. It should solve your problems as well. The parts about running the external annotation through separately, before combining with the others, is important so don’t skip that.
The Help on the tool form explains more but is often missed. The tutorial linked from the bottom of the tool form also has a detailed walk-through, with example data and a workflow template that you could also decide to use.
I am facing the same issues. I saw the history and did exactly the same thing. Anyone who figured it out?
Hi again! Thanks for the helpful info and tutorials. They are really helping me get a better understanding of the mechanics of the software, workflow, and formatting. I did notice that the GTF file I downloaded from UCSC did not have a header but ended up with one after processing with StringTie merge. For pre-processing the GTF file alone with StringTie merge, does it matter if it goes as a transcript or reference input?
Hello, I went over the helpful Q&A StringTie merge error notes and pre-processed the GTF file from USCS separately by StringTie merge before running as external annotation and still have StringTie Gene counts coming up empty. The first 3 out of 9 do have counts which tells me something is working but still think I am missing something. Any chance you could take another look? Here is the link to my latest history: Galaxy
Thanks for sharing the history with your question, very helpful!
When running HISAT2, try setting the advanced options to output some information specific to the alignments that Stingtie can make use of.
The exact instructions are in the tutorials a few places.
- All tutorials that include HISAT2 Galaxy Training! (this is linked at the bottom of the tool form in the Help section)
- This is a good choice since it also has help for determining strand (didn’t see that step in your history, but noticed you are picking reverse strand). You might already know (and I agree is reverse for your case), so that part is mostly for others reading later on. But the step right after discusses HISAT2 usage. In short, know the strand, then also tell HISAT2 how to report the hits with splice-aware metrics in the BAM: De novo transcriptome reconstruction with RNA-Seq
Hope that helps!
ps: I would like for you to consider using Collections If interested, please see this tutorial. It is somewhat similar to what you are doing now, so will not be that tedious or awful to go through… plus learning how these other methods are both different, and the same (!), will help you to make analysis decision ongoing. Reference-based RNA-Seq data analysis