Genome wide alternate splicing analysis - IsoformSwitchAnalyzeR error from StringTie input - no CDS

Hi there,

I have been attempting to run genome -wide isoform analysis as per the Tutorial here: Hands-on: Genome-wide alternative splicing analysis / Genome-wide alternative splicing analysis / Transcriptomics

I am using my own data. I have encountered a problem at the step of importing data to IsoformSwitchAnalyzeR. The error is shown below:

Step 1 of 2: importing GTF (this may take a while)…
Step 2 of 2: Adding ORF…
Error in addORFfromGTF(SwitchList, removeNonConvensionalChr = args$removeNonConvensionalChr, :
No ORFs could be added to the switchAnalyzeRlist. Please ensure GTF file have CDS info (and that isoform ids match).

To my knowledge StringTie does not output CDS, only “transcript” and “exon”, is this causing the error? Can you please advise the best approach to rectifying this?

The GTF file is from Ensembl and does have CDS in the input :slight_smile:
However the annotation generated by StringTie/StringtieMerge do not contain any CDS.
It’s being run on the Galaxy Au server.

Please let me now if I can add any more info :slight_smile:

Thanks so much for your advice,

Anna

Welcome, @bellez34r3

That tutorial has several data preparation steps after the Stringtie step. Are you also doing those for your own data? If not, I would suggest starting there.

Hi Anna, I hope you are very well!

I am doing the same pipeline as you and I encountered the same problem, were you able to add CDS to the StringTie output or what was the path you followed to solve this issue?

I hope you can help me thank you very much

Welcome, @Carina_RCh (and @bellez34r3 can still reply of course!)

As far as I know, the error reported above can be due to not using a reference annotation with Stringtie. Specifically, the tool is trying to match up transcript identifiers aka “isoform ids” between the different input files. What is your use case?

Hi Jennaj

Thanks for replying…
In my case, as a reference annotation I am using the output of the StringTie merge described in this pipeline step:

I have done all the steps to generate the annotation file, but I get the same error as Anna.

I understand that StringTie does not give the “CDS” in its output file, but switchAnalyzer requires it to import the data.

Hi all,

Apologies for the delay in replying. I also tried following the steps.
It was ok when making transcript coordinates, reference transcriptome annotation, and transcriptome quantification with StringTie.

I am yet to find a solution. I haven’t been able to get past the “import data” step in IsoformSwitchAnalyzeR. I’m pretty new to this, from what I gather - in the “import data” step the error comes because the Reference Transcriptome Annotation generated with StringTie have no “CDS” features listed and that is needed for IsoformWitchAnalyzeR to work? The only features it lists (from looking at the file) are “transcript” and “exon”.

If you find a solution that would be great :slight_smile:

Cheers,

Anna

Hi Jenna,

In my case I have used a reference GTF file (Ensembl) as the guide for assembly in StringTie. I checked and it does have CDS features, however the “assembled transcript coordinates” generated with StringTie has no CDS in the output. Is there a reason this is lost during transcriptome assembly that I can fix?

Thanks so much!

Anna

Hi @bellez34r3

Stringtie is a discovery tool, and it doesn’t call (or annotate) newly predicted coding regions in discovered transcripts.

See the process at this specific step in the tutorial you referenced. https://training.galaxyproject.org/training-material/topics/transcriptomics/tutorials/differential-isoform-expression/tutorial.html#hands-on-transcriptome-assembly-with-stringtie

The other tools are important and the options used differ between runs through Stringtie. Each sample is run through twice – once for discovery per sample, then those are merged to remove redundancy, then all samples are run through again using the merged result as a new “reference only” set. The gffread steps are important for first gathering the inputs needed to call CDS regions, then run again to actually get those regions captured in the final annotation.

If possible, maybe use the workflow included with that tutorial as a template? Or, at least some parts of it? Or you could run the tutorial data through to create a sort of “reference history” then review the tools and parameters applied, compare the different files produced, and learn where your process is different?

Let us know if you need more help or if you sort this out. It should definitely work. :slight_smile:

Thank you!

I think for sure I will run through the process with the data provided for the tutorial itself so I can get a handle on the inputs and outputs at each step. Then I can try and narrow down where the issue is :slight_smile:
Thanks so much for your help. If I come up with a solution after doing this I will post it here for Carina as well.

Cheers,

Anna

Hi @bellez34r3

Great, thank you!

To help a bit with this, I also tried to run the tutorial data through the workflow and discovered a tiny wrinkle… the input data and workflow are not a perfect match, and a small adjustment is needed. This issue ticket contains all of the details → Suggested update to workflow for differential-isoform-expression · Issue #5208 · galaxyproject/training-material · GitHub. Followup about this proposed change will post back there.

Update: A history with the tutorial completed is here (uses the modified workflow) https://usegalaxy.eu/u/jenj/h/genome-wide-alternative-splicing-analysis-human-modified-for-gtn

The basic analysis steps are unchanged, and valid, if you refer to the hands-on portion of the tutorial. And, if you want to use the tutorial’s workflow for some other reason – edit away! You would need to make changes to handle your own collections anyway.