Detecting Novel Splice Variants with Kallisto

Ryan_Morris · January 20, 2021, 2:13pm

Dear all,

I am working on a project which is involved with detecting novel splice variants of Src kinase in the brain tumour cell lines. Typically I would run using Galaxy kallisto to estimate the abundance of transcripts of interest. I am having a bit of issue with my latest attempts as one of the Src neuronal variants is currently not annotated as a transcript, although will be in due course ( Making a custom Transcriptome)

I have been able to generate a known sequence for a particular neuronal variant of Src ( known as N2-SRC) and I have been trying to update my reference transcriptome with this element. On Galaxy I converted my reference transcriptome ( latest Ensembl build 102) from FASTA to Tabular then to CSV format. Once downloaded the CSV I have been able to type in a notation and a sequence for the N2 splice variant.

My issue now lies with when I try to upload my edited csv transcriptome. Its being recognised as a text file. When I try Convert delimiters to TAB function I get a tabular file which I then convert into a FASTA file. However when I then run Kallisto with my edited transcriptome I am getting very weird results. The lengths of my transcripts have all been increased which is causing me to get weird results. I have come to the conclusion that their must be a formatting issue along the way somewhere, with my gene name text being incorporated into my sequence text.

Any ideas to how I could fix this issue would be greatly appreciated

Many Thanks
Ryan