I noticed a new parameter (Does the input have read pairs?) in the options of FeatureCounts about reads pairs. I can choose between single-end, paired-end counting reads individually and paired-end counting paired-reads as 1 fragment. Regarding paired-end options, I don’t understand what the practical implications would be for my RNA-seq study (differently gene expression analysis between 2 groups). Also, if I choose to count as fragments, will the fragment only be considered if the two pairs are aligned correctly? If I choose to count them individually, I can also opt to “only allow fragments with both reads aligned”, does this only mean I’ll have double the number of counts compared to counting as 1 fragment ? If that’s the only difference, it doesn’t sound so relevant to me if we consider counts will be normalized per million of counts in the next steps.
Featurecounts has many options for fine tuning how the counts are generated. Open up the Advanced settings to see even more.
Many of those your mentioned will double count (or more), along with some of the other options.
After a run, click on the job info “i” icon and scroll down to the logs to see the command line. That will reveal the options applied that can be compared to the guide the authors have here https://subread.sourceforge.net/featureCounts.html
For standard RNA-seq, the default settings in Galaxy are probably what you want to use. But you could experiment with other parameter sets and compare across runs to see what happens. You could even subset your BAM to a specific region of interest, and quickly generate the counts that way to focus in and learn exactly how the tool works.
This tool can be used with all sorts of read types, including single cell long reads, where tuning up how the overlap is calculated will matter due to the extended length of the “query” (reads) versus “features” (whatever the annotation represents).