Hello,
I would like to compute the ExN50 statistic for a de novo transcriptome assembled with trinity.
Following the steps indicated on the trinity github page I, in order:
Aligned reads back and estimated abundances (kallisto)
Built expression matrix (kallisto)
Computed the ExN50 stat
However, even if the transcriptome is composed by about 37.000 transcripts, the ExN50 stat is finally based on only about 27.000 transcripts. These correspond to the trinity gene to transcript mappings.
My questions are:
How is it possible?
Is that a reasonable “approximation” to compute this statistic?
“The ExN50 now intentionally works off of the ‘gene’ rather than the transcript. The transcript based value can be a bit misleading because you can have long transcripts that are, after quantification, presumed to be lowly expressed and these can end up contributing to longer N50 values at low expression levels (artifactually)”