Workflow for differential exon usage/alternative splicing

HanMue · December 10, 2019, 5:43pm

Hi everyone,

I am looking for a workflow to identify and visualize differential exon usage/alternative splicing in mouse RNASeq data in Galaxy.
What I did up to now was a de novo transcriptome reconstruction to identify novel splice sites, e.g. isoforms:

Alignment with HISAT2 (GTF file with known splice sites and with the option ‘transcripts tailored for StringTie’)
StringTie with a Reference Annotation (-G option) and “Use Reference transcripts only” set: NO to discover novel transcripts
StringTie merge with a Reference Annotation (-G option) for the transcriptome assembly
StringTie again, this time with the StringTie merge.gtf as Reference Annotation and “Use Reference transcripts only” set: YES, as the novel transcripts are now included in the file. Now I have 3 StringTie-outputs per sample:

Assembled transcripts
Transcript counts
Gene counts

Additionally,I annotated the StringTie merge output with GffCompare to get an overview of e.g. novel isoforms.

Now I need some help with the downstream analysis.
Is DEXSeq the best tool to compare different conditions for differential splicing/exon usage? I have 2 more questions about DEXSeq:

Which files should I take as input files for DEXeq? Can I use one of the StringTie outputs (Transcript count files?) or do I need another tool for preparing the counts before using DEXSeq?
DEXSeq needs a ‘GTF file created from DEXSeq-Count’ - as far as I understood, DEXSeq-Count extracts the exons and corresponding gene_ids. I tried to use the StringTie merge.gtf and the GffCompare annotated transcripts.gtf in DEXSeq-Count prepare annotation-mode. It ends with the following error:

An error occurred while running the tool toolshed.g2.bx.psu.edu/repos/iuc/dexseq/dexseq_count/1.28.1.0 .

  File "/usr/local/tools/_conda/envs/__bioconductor-dexseq@1.28.1/bin/dexseq_prepare_annotation.py", line 56, in <module>
    exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )
  File "python2/src/HTSeq/_HTSeq.pyx", line 519, in HTSeq._HTSeq.GenomicArray.__getitem__
KeyError: 'Non-stranded index used for stranded GenomicArray.'

My data is single-end and unstranded. Do you know which other gtf should be taken as input and what could be wrong with the gtfs I tried?
It would be great If someone knew how to proceed with the analysis!

Best

Hannah

HanMue · December 13, 2019, 4:32pm

I think I can now reduce the question to one point:

I try now using DEXSeq_Count in first prepare annotation - mode and second in count reads - mode. Then use the counting files in DEXSeq.
My data is single end and unstranded and I think the problem ist, that DEXSeq_Count in prepare annotation- mode expects a stranded gtf.file and that’s probybly why I get this error:

Traceback (most recent call last):
  File "/usr/local/tools/_conda/envs/__bioconductor-dexseq@1.28.1/bin/dexseq_prepare_annotation.py", line 56, in <module>
    exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )
  File "python2/src/HTSeq/_HTSeq.pyx", line 519, in HTSeq._HTSeq.GenomicArray.__getitem__
KeyError: 'Non-stranded index used for stranded GenomicArray.'

Does anyone know how to fix this?

Thank you!

Hannah

HanMue · December 16, 2019, 10:08am

ping @hexylena - just in case you know whether the strandedness could be an issue and how to set the tool on galaxy.eu for that?
Many thanks!

Topic		Replies	Views
Tool to use in place of Cuffdiff gtn-tutorial , transcriptomics , stringtie	15	969	January 25, 2021
Issues moving from StringTie into Seurat with scRNA-seq	0	464	June 18, 2020
Troubleshooting DexSeq transcriptomics , deseq2	10	245	November 2, 2023
DE analysis after StringTie transcriptomics , stringtie	1	650	December 2, 2020
StringTie Error (35, 'Process open FD table is full') usegalaxy.org support	1	641	May 31, 2020

Workflow for differential exon usage/alternative splicing

Related Topics