How can I improve very low assigned rate in featureCounts?

Hi,

The strand for HISAT2 paired-end inputs should be FR, RF, or Unstranded, so this might be a typo and you meant RF?

Featurecounts also requires strandedness to match what was used for mapping. That is an F, R, or Unstranded toggle.

The data is failing for three primary reasons:

  • Unmapped:

    • Did you run the fastq data through QA/QC tools before mapping?
    • Some level of unmapped is expected, it depends on the quality of your sequence data. Trimming cannot eliminate all data problems.
  • Mapping quality:

    • The default is “12” but that can be modified under advanced settings.
    • Was this changed? If yes, try using the default.
  • NoFeatures:

    • If the strandedness is incorrect, the number of reads discarded for this reason can be very high.
    • It could also be high because your reads do not correspond (content-wise) to known transcripts.
    • Was there something special about how the library was constructed? If true, you might need to provide your own reference annotation that matches your sequencing target (ncRNA, etc).

I would suggest comparing your methods to those in this Galaxy Training Network (GTN) tutorial. QA/QC, strandedness assessment, and usage for these tools are all covered.

Using the built-in annotation for mm10 is usually a very good choice for RNA-seq data. There are other sources but I don’t think the result will be much different if using a basic transcript reference annotation dataset from any source.

But if you want to try, Gencode and iGenomes are good alternative sources, with Gencode a bit simpler to get into Galaxy. This prior Q&A is about human (hg38), but both sources also have data for mouse (mm10): RNA-STAR and hg38 GTF reference annotation - #2 by jennaj