I am looking for a workflow to identify and visualize differential exon usage/alternative splicing in mouse RNASeq data in Galaxy.
What I did up to now was a de novo transcriptome reconstruction to identify novel splice sites, e.g. isoforms:
- Alignment with
HISAT2(GTF file with known splice sites and with the option ‘transcripts tailored for StringTie’)
StringTiewith a Reference Annotation (-G option) and “Use Reference transcripts only” set: NO to discover novel transcripts
StringTie mergewith a Reference Annotation (-G option) for the transcriptome assembly
StringTieagain, this time with the StringTie merge.gtf as Reference Annotation and “Use Reference transcripts only” set: YES, as the novel transcripts are now included in the file. Now I have 3 StringTie-outputs per sample:
- Assembled transcripts
- Transcript counts
- Gene counts
Additionally,I annotated the
StringTie merge output with
GffCompare to get an overview of e.g. novel isoforms.
Now I need some help with the downstream analysis.
DEXSeq the best tool to compare different conditions for differential splicing/exon usage? I have 2 more questions about
- Which files should I take as input files for DEXeq? Can I use one of the StringTie outputs (Transcript count files?) or do I need another tool for preparing the counts before using DEXSeq?
- DEXSeq needs a ‘GTF file created from DEXSeq-Count’ - as far as I understood,
DEXSeq-Countextracts the exons and corresponding gene_ids. I tried to use the StringTie merge.gtf and the GffCompare annotated transcripts.gtf in
DEXSeq-Countprepare annotation-mode. It ends with the following error:
An error occurred while running the tool toolshed.g2.bx.psu.edu/repos/iuc/dexseq/dexseq_count/22.214.171.124 .
File "/firstname.lastname@example.org/bin/dexseq_prepare_annotation.py", line 56, in <module> exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] ) File "python2/src/HTSeq/_HTSeq.pyx", line 519, in HTSeq._HTSeq.GenomicArray.__getitem__ KeyError: 'Non-stranded index used for stranded GenomicArray.'
My data is single-end and unstranded. Do you know which other gtf should be taken as input and what could be wrong with the gtfs I tried?
It would be great If someone knew how to proceed with the analysis!