Welcome @NitDawg
Your read data appears to have high duplication. If you run FastQC
you’ll find more details/confirmation about read duplication in those reports.
QA won’t help if the source data actually has high redundancy (tool: Trimmomatic). It could just be low-quality sequencing results or very deep sequencing was done. Contamination could be a factor, but removing those reads won’t help to get more data assigned to a known gene from your reference annotation, it will just reduce the final number of “unassigned-ambiguity” later on in the pipeline.
One note: It is import to run HISAT2
with the option to output results that are formatted for Stringtie
. That is covered in the tutorial but is sometimes missed. Worth double-checking. Use the “rerun” (double circle icon) for the mapping jobs to review what options you used.
More DE analysis tutorials can be found here under the group “Transcriptomics” if you want to compare methods/tool choices:
Hope that helps!