DESeq2 Error on Kallisto output

Sarah_Perkins · December 4, 2019, 4:53pm

Hi All, I used Kallisto to generate transcript abundance files for RNASeq data and am trying to run it through DESeq2. My input is 4 .tsv files as well as a .gene_trans_map file, and the following error message is generated:

Note: importing abundance.h5 is typically faster than abundance.tsv reading in files with read.delim (install ‘readr’ package for speed up) 1 2 3 4 removing duplicated transcript rows from tx2gene Error in $<-.data.frame ( *tmp* , “TXNAME”, value = character(0)) : replacement has 0 rows, data has 510059 Calls: get_deseq_dataset -> <- -> <-.data.frame

I have tried changing a variety of parameters, including setting header to false, and I have also tried only using one dataset per factor level, and nothing is working to fix this error. I have also tried using a .gff3 file rather than a .gene_trans_map file, and that also is not making a difference. Any advice would be much appreciated!
Thank you.

jennaj · December 4, 2019, 9:44pm

Welcome, @Sarah_Perkins

I see your bug report sent in from Galaxy Main https://usegalaxy.org.

The problem is that your transcript fasta dataset identifiers do not match up with the transcript-to-gene mapping dataset.

The fasta (data 14) and TMP (data 356 + 357 + 358 + 362) transcript identifiers are formatted as:

TR100009|c0_g1_i1|m.500685

The tabular transcript-tab-gene dataset (data 347) has two problems: 1) truncated transcript and gene identifier formats that do not match the TMP inputs, plus 2) it looks as if the order might be reversed (gene-tab-transcript)? As long as the genes are named consistently, it doesn’t matter what they are, but the transcript names do need to match the other inputs.

TR1|c0_g1	TR1|c0_g1_i1

In short, primary IDs need to be in the same exact format across all inputs or tools cannot match data up. This is true for any tool, not just DESeq2.

The gff3 dataset is missing a header line: ##gff-version 3. That is why it was given the more generic gff datatype during Upload, and why the tool form does not recognize it as a valid input.

Also, replicates are required with DESeq2 in Galaxy.

FAQ: https://galaxyproject.org/support/

Extended Help for Differential Expression Analysis Tools

Hope that helps!

Sarah_Perkins · December 4, 2019, 10:02pm

Thank you! That seems to solve it!

jennaj · December 5, 2019, 2:10am

Super, glad you got this working!