Hi,
The strand for HISAT2 paired-end inputs should be FR
, RF
, or Unstranded
, so this might be a typo and you meant RF
?
Featurecounts
also requires strandedness to match what was used for mapping. That is an F
, R
, or Unstranded
toggle.
The data is failing for three primary reasons:
-
Unmapped:
- Did you run the fastq data through QA/QC tools before mapping?
- Some level of unmapped is expected, it depends on the quality of your sequence data. Trimming cannot eliminate all data problems.
-
Mapping quality:
- The default is “12” but that can be modified under advanced settings.
- Was this changed? If yes, try using the default.
-
NoFeatures:
- If the strandedness is incorrect, the number of reads discarded for this reason can be very high.
- It could also be high because your reads do not correspond (content-wise) to known transcripts.
- Was there something special about how the library was constructed? If true, you might need to provide your own reference annotation that matches your sequencing target (ncRNA, etc).
I would suggest comparing your methods to those in this Galaxy Training Network (GTN) tutorial. QA/QC, strandedness assessment, and usage for these tools are all covered.
- RNA-seq: Discovering and quantifying new transcripts - an in-depth transcriptome analysis example.
Using the built-in annotation for mm10 is usually a very good choice for RNA-seq data. There are other sources but I don’t think the result will be much different if using a basic transcript reference annotation dataset from any source.
But if you want to try, Gencode and iGenomes are good alternative sources, with Gencode a bit simpler to get into Galaxy. This prior Q&A is about human (hg38), but both sources also have data for mouse (mm10): RNA-STAR and hg38 GTF reference annotation - #2 by jennaj