Hi everyone,
I am looking for a workflow to identify and visualize differential exon usage/alternative splicing in mouse RNASeq data in Galaxy.
What I did up to now was a de novo transcriptome reconstruction to identify novel splice sites, e.g. isoforms:
- Alignment with
HISAT2
(GTF file with known splice sites and with the option ‘transcripts tailored for StringTie’) -
StringTie
with a Reference Annotation (-G option) and “Use Reference transcripts only” set: NO to discover novel transcripts -
StringTie merge
with a Reference Annotation (-G option) for the transcriptome assembly -
StringTie
again, this time with the StringTie merge.gtf as Reference Annotation and “Use Reference transcripts only” set: YES, as the novel transcripts are now included in the file. Now I have 3 StringTie-outputs per sample:
- Assembled transcripts
- Transcript counts
- Gene counts
Additionally,I annotated the StringTie merge
output with GffCompare
to get an overview of e.g. novel isoforms.
Now I need some help with the downstream analysis.
Is DEXSeq
the best tool to compare different conditions for differential splicing/exon usage? I have 2 more questions about DEXSeq
:
- Which files should I take as input files for DEXeq? Can I use one of the StringTie outputs (Transcript count files?) or do I need another tool for preparing the counts before using DEXSeq?
- DEXSeq needs a ‘GTF file created from DEXSeq-Count’ - as far as I understood,
DEXSeq-Count
extracts the exons and corresponding gene_ids. I tried to use the StringTie merge.gtf and the GffCompare annotated transcripts.gtf inDEXSeq-Count
prepare annotation-mode. It ends with the following error:
An error occurred while running the tool toolshed.g2.bx.psu.edu/repos/iuc/dexseq/dexseq_count/1.28.1.0 .
File "/usr/local/tools/_conda/envs/__bioconductor-dexseq@1.28.1/bin/dexseq_prepare_annotation.py", line 56, in <module>
exons[f.iv] += ( f.attr['gene_id'], f.attr['transcript_id'] )
File "python2/src/HTSeq/_HTSeq.pyx", line 519, in HTSeq._HTSeq.GenomicArray.__getitem__
KeyError: 'Non-stranded index used for stranded GenomicArray.'
My data is single-end and unstranded. Do you know which other gtf should be taken as input and what could be wrong with the gtfs I tried?
It would be great If someone knew how to proceed with the analysis!
Best
Hannah