Assembly statistics in reference based RNA seq analysis_tools

How to estimate the assembly stats in reference based RNA seq analysis?
I used FastQ-Trimmomatic-HISAT2-Stringtie-Stringtie merge-Deseq2 pipeline for analysis of my RNA seq data.
Which tools in galaxy should I use to estimate the assembly statistics?

susheel

Hi @susheel_raina

The reads are not actually assembled into a transcriptome yet with these tools. The workflow you are following instead creates a type of “pseudo assembly” based on coordinates and generates abundance statistics based on that, instead of fully assembled transcripts in a fasta file.

There isn’t a tutorial dedicated to RNA-seq read full assembly, but Galaxy does have a few tools you can explore. rnaSPAdes and Trinity are two common choices. The first is slightly preferred but you can explore and make your own decisions. :slight_smile:

How to use both is on the tool forms plus this forum has much discussion about technical troubleshooting. General use is covered online by several groups too, and some are in the context of Galaxy. An internet search will find those, and the usage in Galaxy will be similar to using the tools directly if you decide to follow a non-Galaxy publication or protocol.

If I misunderstood what you were asking for, could you define what you mean by “assembly statistics” more? In our tutorials, that usually refers to generating estimates about how “complete” a fasta assembly is for an organism, and usually in the context of a genome assembly not transcriptome assembly (so, DNA reads instead of RNA).

Hope this helps!