Hi, I am relatively new to RNA-seq data analysis following the " Finding and quantifying new transcripts" tutorial (https://galaxyproject.org/tutorials/nt_rnaseq/). The transcriptomics study that I done was on the effects of a chemical on a Mycobacterial species. I uploaded the fasta and gtf file and followed closely to what the tutorial suggested. My Hisat2 summary was as follows:
HISAT2 summary stats:
Total pairs: 20863625
Aligned concordantly or discordantly 0 time: 3642143 (17.46%)
Aligned concordantly 1 time: 8046746 (38.57%)
Aligned concordantly >1 times: 9064704 (43.45%)
Aligned discordantly 1 time: 110032 (0.53%)
Total unpaired reads: 7284286
Aligned 0 time: 5004928 (68.71%)
Aligned 1 time: 931102 (12.78%)
Aligned >1 times: 1348256 (18.51%)
Overall alignment rate: 88.01%.
I followed up filtering only for quality reads (during filtering did not check for paired reads and paired reads mapped in proper pair), stringtie and stringtie merge to create a merged transcriptomics file as per the tutorial and followed up with featureCounts to determine the counts of my data. However, the summary for featureCounts for one of my datasets looked like this:
| Status | Filtered LJ5 Control 1 BAM |
|---|---|
| Status | Filtered LJ5 Control 1 BAM |
| Assigned | 1844744 |
| Unassigned_Unmapped | 0 |
| Unassigned_MappingQuality | 0 |
| Unassigned_Chimera | 7171 |
| Unassigned_FragmentLength | 0 |
| Unassigned_Duplicate | 0 |
| Unassigned_MultiMapping | 0 |
| Unassigned_Secondary | 0 |
| Unassigned_NonSplit | 0 |
| Unassigned_NoFeatures | 19955 |
| Unassigned_Overlapping_Length | 0 |
| Unassigned_Ambiguity | 7025997 |
with a high amount of reads unassigned due to ambiguity.
I followed the tutorial pretty strictly with regards to parameters to set for featureCounts except for strand specificity (I have half for either so unstranded). Please help if possible. Thank you.
Yours sincerely,
Wei Jie