ExN50 computes expression of 'gene' and not trinity transcripts in de novo assembled transcriptome

luca · December 16, 2019, 8:42am

Hello,
I would like to compute the ExN50 statistic for a de novo transcriptome assembled with trinity.
Following the steps indicated on the trinity github page I, in order:

Aligned reads back and estimated abundances (kallisto)
Built expression matrix (kallisto)
Computed the ExN50 stat

However, even if the transcriptome is composed by about 37.000 transcripts, the ExN50 stat is finally based on only about 27.000 transcripts. These correspond to the trinity gene to transcript mappings.

My questions are:

How is it possible?
Is that a reasonable “approximation” to compute this statistic?

Thank you for your help!
Luca

luca · January 2, 2020, 12:17pm

Finally I ended up with an answer.
I report here the answer of Brian Haas to the same issue posted by another trinity user

[https://groups.google.com/forum/#!searchin/trinityrnaseq-users/gene$20to$20transcripts|sort:date/trinityrnaseq-users/LW2kF2P7ACk/1kerYhkFAgAJ]

“The ExN50 now intentionally works off of the ‘gene’ rather than the transcript. The transcript based value can be a bit misleading because you can have long transcripts that are, after quantification, presumed to be lowly expressed and these can end up contributing to longer N50 values at low expression levels (artifactually)”

Hope this helps,

Best,

Luca