Workflow for differential exon usage/alternative splicing

Hi everyone,

I am looking for a workflow to identify and visualize differential exon usage/alternative splicing in mouse RNASeq data in Galaxy.
What I did up to now was a de novo transcriptome reconstruction to identify novel splice sites, e.g. isoforms:

  1. Alignment with HISAT2 (GTF file with known splice sites and with the option ‘transcripts tailored for StringTie’)
  2. StringTie with a Reference Annotation (-G option) and “Use Reference transcripts only” set: NO to discover novel transcripts
  3. StringTie merge with a Reference Annotation (-G option) for the transcriptome assembly
  4. StringTie again, this time with the StringTie merge.gtf as Reference Annotation and “Use Reference transcripts only” set: YES, as the novel transcripts are now included in the file. Now I have 3 StringTie-outputs per sample:
  • Assembled transcripts
  • Transcript counts
  • Gene counts

Additionally,I annotated the StringTie merge output with GffCompare to get an overview of e.g. novel isoforms.

Now I need some help with the downstream analysis.
Is DEXSeq the best tool to compare different conditions for differential splicing/exon usage? I have 2 more questions about DEXSeq:

  1. Which files should I take as input files for DEXeq? Can I use one of the StringTie outputs (Transcript count files?) or do I need another tool for preparing the counts before using DEXSeq?
  2. DEXSeq needs a ‘GTF file created from DEXSeq-Count’ - as far as I understood, DEXSeq-Count extracts the exons and corresponding gene_ids. I tried to use the StringTie merge.gtf and the GffCompare annotated transcripts.gtf in DEXSeq-Count prepare annotation-mode. It ends with the following error:

An error occurred while running the tool toolshed.g2.bx.psu.edu/repos/iuc/dexseq/dexseq_count/1.28.1.0 .

  File "/usr/local/tools/_conda/envs/__bioconductor-dexseq@1.28.1/bin/dexseq_prepare_annotation.py", line 56, in <module>
    exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )
  File "python2/src/HTSeq/_HTSeq.pyx", line 519, in HTSeq._HTSeq.GenomicArray.__getitem__
KeyError: 'Non-stranded index used for stranded GenomicArray.'

My data is single-end and unstranded. Do you know which other gtf should be taken as input and what could be wrong with the gtfs I tried?
It would be great If someone knew how to proceed with the analysis!

Best

Hannah

I think I can now reduce the question to one point:

I try now using DEXSeq_Count in first prepare annotation - mode and second in count reads - mode. Then use the counting files in DEXSeq.
My data is single end and unstranded and I think the problem ist, that DEXSeq_Count in prepare annotation- mode expects a stranded gtf.file and that’s probybly why I get this error:

Traceback (most recent call last):
  File "/usr/local/tools/_conda/envs/__bioconductor-dexseq@1.28.1/bin/dexseq_prepare_annotation.py", line 56, in <module>
    exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )
  File "python2/src/HTSeq/_HTSeq.pyx", line 519, in HTSeq._HTSeq.GenomicArray.__getitem__
KeyError: 'Non-stranded index used for stranded GenomicArray.'

Does anyone know how to fix this?

Thank you!

Hannah

ping @hxr - just in case you know whether the strandedness could be an issue and how to set the tool on galaxy.eu for that?
Many thanks!