Hi Swati,
Increase in number of samples and increase of CPM in low count filtering should remove more genes. Yes, this is more aggressive filtering.
Not all samples have >95% mapping rate. One sample has 17% mapped reads, another 39%.
Different gene annotations give different results including different number of reads assigned to genes. The option you used, built-in featureCounts gene model, is a safe choice. Examine alignments on UCSC Genome Browser and check off-target reads. Just in case: click on collection of BAM alignments > click on any BAM file in the collection > click on bar plot (Visuallize) icon > click at Disply at UCSC (main) link in the middle window. Remove tracks you don’t need for faster performance. Add tracks like SNPs - see my previous reply. Make sure you have gene annotations. You can add several alignments to UCSC session. Check the reads. usually we expect more or less random distribution along the gene (exons). Usually the coverage is not uniform, but the reads usually do not map in blocks,
Maybe check DE genes from the paper on interactive limma plots. Look at expression level. Many top genes highlighted in Volcano plot in the fie I looked at have low expression, plus the group with big number of samples have wide distribution for genes: no expression in some samples, while other samples have gene expression in range similar to the second group.
Are you sure the libraries are not stranded? Check nucleotide distributioni in FastQC outputs for sample ending at 63. The proportion of complementary nucleotides is not identical (C and T are over-represented in F reads, while under represented in R). This is an indication of a stranded protocol.
Galaxy Europe has Salmon and tximport, so you can import GENCODE transcripts and reproduce the published protocol. Alternatively, download GENCODE gene annotations in gtf format and use it instead of the built-in featureCounts gene model.
As I said previously, I have some doubts about the data quality. Not sure if this can be compensated by changing limma settings. Maybe do more QC, like gene coverage.
The individual steps are described in corresponding sections. The tutorial also describes how to check the strandness.
The method section mentions similar age and outcome, but does not mention sex. Do you know if both male and female samples were used? It might be mentioned somewhere in the paper.
Kind regards,
Igor