Can someone explain how does HISAT2 works? I’m currently processing an unstranded, pair-ended data set that has about 20 million reads or so after trimming. After HISAT2 with mm10 reference genome, the output is about 10 million reads, where did the other half ago? What’s weird is when I run htseqcount the counts add up to 20million.
Followup questions: ideally what should I be expecting, it seems like there are alot of reads that are falling under the no feature category, which I’m assuming is not that good.
1.Category | 2.HISAT2 on data 2 and data 1: aligned reads (BAM) |
---|---|
__no_feature | 8905861 |
__ambiguous | 84369 |
__too_low_aQual | 0 |
__not_aligned | 5158843 |
__alignment_not_unique | 6342605 |
Also if anyone can give me feedback on my pipeline and possible improvements
I use trimmomatic and FASTQC
then HISTAT2 to output BAM
then htseq_count for DeSeq2
Are there any thing I should be cautious or some crucial step I’m leaving out?
Thanks for all the advice and help!