Genome wide alternate splicing analysis - IsoformSwitchAnalyzeR error from StringTie input - no CDS

bellez34r3 · July 4, 2024, 3:35am

Hi there,

I have been attempting to run genome -wide isoform analysis as per the Tutorial here: Hands-on: Genome-wide alternative splicing analysis / Genome-wide alternative splicing analysis / Transcriptomics

I am using my own data. I have encountered a problem at the step of importing data to IsoformSwitchAnalyzeR. The error is shown below:

Step 1 of 2: importing GTF (this may take a while)…
Step 2 of 2: Adding ORF…
Error in addORFfromGTF(SwitchList, removeNonConvensionalChr = args$removeNonConvensionalChr, :
No ORFs could be added to the switchAnalyzeRlist. Please ensure GTF file have CDS info (and that isoform ids match).

To my knowledge StringTie does not output CDS, only “transcript” and “exon”, is this causing the error? Can you please advise the best approach to rectifying this?

The GTF file is from Ensembl and does have CDS in the input
However the annotation generated by StringTie/StringtieMerge do not contain any CDS.
It’s being run on the Galaxy Au server.

Please let me now if I can add any more info

Thanks so much for your advice,

Anna

jennaj · July 5, 2024, 4:48pm

Welcome, @bellez34r3

That tutorial has several data preparation steps after the Stringtie step. Are you also doing those for your own data? If not, I would suggest starting there.

Carina_RCh · August 6, 2024, 6:00pm

Hi Anna, I hope you are very well!

I am doing the same pipeline as you and I encountered the same problem, were you able to add CDS to the StringTie output or what was the path you followed to solve this issue?

I hope you can help me thank you very much

jennaj · August 6, 2024, 6:33pm

Welcome, @Carina_RCh (and @bellez34r3 can still reply of course!)

As far as I know, the error reported above can be due to not using a reference annotation with Stringtie. Specifically, the tool is trying to match up transcript identifiers aka “isoform ids” between the different input files. What is your use case?

Carina_RCh · August 6, 2024, 8:55pm

Hi Jennaj

Thanks for replying…
In my case, as a reference annotation I am using the output of the StringTie merge described in this pipeline step:

I have done all the steps to generate the annotation file, but I get the same error as Anna.

I understand that StringTie does not give the “CDS” in its output file, but switchAnalyzer requires it to import the data.

bellez34r3 · August 6, 2024, 9:51pm

Hi all,

Apologies for the delay in replying. I also tried following the steps.
It was ok when making transcript coordinates, reference transcriptome annotation, and transcriptome quantification with StringTie.

I am yet to find a solution. I haven’t been able to get past the “import data” step in IsoformSwitchAnalyzeR. I’m pretty new to this, from what I gather - in the “import data” step the error comes because the Reference Transcriptome Annotation generated with StringTie have no “CDS” features listed and that is needed for IsoformWitchAnalyzeR to work? The only features it lists (from looking at the file) are “transcript” and “exon”.

If you find a solution that would be great

Cheers,

Anna

bellez34r3 · August 6, 2024, 9:59pm

Hi Jenna,

In my case I have used a reference GTF file (Ensembl) as the guide for assembly in StringTie. I checked and it does have CDS features, however the “assembled transcript coordinates” generated with StringTie has no CDS in the output. Is there a reason this is lost during transcriptome assembly that I can fix?

Thanks so much!

Anna

jennaj · August 7, 2024, 12:00am

Hi @bellez34r3

Stringtie is a discovery tool, and it doesn’t call (or annotate) newly predicted coding regions in discovered transcripts.

See the process at this specific step in the tutorial you referenced. https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/differential-isoform-expression/tutorial.html#hands-on-transcriptome-assembly-with-stringtie

The other tools are important and the options used differ between runs through Stringtie. Each sample is run through twice – once for discovery per sample, then those are merged to remove redundancy, then all samples are run through again using the merged result as a new “reference only” set. The gffread steps are important for first gathering the inputs needed to call CDS regions, then run again to actually get those regions captured in the final annotation.

If possible, maybe use the workflow included with that tutorial as a template? Or, at least some parts of it? Or you could run the tutorial data through to create a sort of “reference history” then review the tools and parameters applied, compare the different files produced, and learn where your process is different?

Let us know if you need more help or if you sort this out. It should definitely work.

bellez34r3 · August 7, 2024, 12:45am

Thank you!

I think for sure I will run through the process with the data provided for the tutorial itself so I can get a handle on the inputs and outputs at each step. Then I can try and narrow down where the issue is
Thanks so much for your help. If I come up with a solution after doing this I will post it here for Carina as well.

Cheers,

Anna

jennaj · August 7, 2024, 1:53am

Hi @bellez34r3

Great, thank you!

To help a bit with this, I also tried to run the tutorial data through the workflow and discovered a tiny wrinkle… the input data and workflow are not a perfect match, and a small adjustment is needed. This issue ticket contains all of the details → Suggested update to workflow for differential-isoform-expression · Issue #5208 · galaxyproject/training-material · GitHub. Followup about this proposed change will post back there.

Update: A history with the tutorial completed is here (uses the modified workflow) https://usegalaxy.eu/u/jenj/h/genome-wide-alternative-splicing-analysis-human-modified-for-gtn

The basic analysis steps are unchanged, and valid, if you refer to the hands-on portion of the tutorial. And, if you want to use the tutorial’s workflow for some other reason – edit away! You would need to make changes to handle your own collections anyway.

Topic		Replies	Views
"error in importRdata" using isoformswitchanalyzR usegalaxy.eu support gtn-tutorial , workflow , transcriptomics , rna-seq , stringtie	14	598	September 12, 2023
IsoformSwitchAnalyzR - "longer object length is not a multiple of shorter object length" usegalaxy.eu support gtn-tutorial , transcriptomics , stringtie	2	540	July 4, 2023
StringTie "no reference transcript". Solutions? Or need new alignments? usegalaxy.org support custom-genome , cloudman , cloud , cloudlaunch	2	2333	April 13, 2020
Error Running StringTie with HISAT2 and annotated File transcriptomics	2	237	March 4, 2024
Issue with rnaQUAST tool transcriptomics , tool-help , rna_quast	27	78	August 19, 2025

Genome wide alternate splicing analysis - IsoformSwitchAnalyzeR error from StringTie input - no CDS

Related topics