I’ve been a Usegalaxy.org user for a couple years now… One question that I never was able to figure out for DESeq2 was what threshold the package on galaxy defaults to for determining what genes to include in DE analysis.
Love’s paper (brilliant, btw) says that any gene whose mean normalized counts are below a given threshold is omitted from DGE Wald testing. This is to retain power when correcting for multiple tests. The algorithm “chooses” a threshold that maximizes the number of genes that would be DE at a user-defined FDR (pre-test). Since there is never an opportunity for the user to define this FDR, I was wondering what the default is on Galaxy.
I ask because there is a gene of interest in my dataset that is given a “strong” P-value (4.4e-8), but not a Q-value (NA), and the normalized counts of one condition (4 replicates) ranges from 11-30 (mean = 10.6); 0.00 in all 4 of the other condition. Thus, I assume the gene is independently filtered out, though, I’m not sure why since I thought that filtering occurred before multiple-test correction, since it’s clearly included in the significance/Wald test. Currently hacking to figure out the mean normalized counts threshold, but would appreciate guidance on the user-defined FDR setting for threshold criteria, or any insight into what I’ve described (i.e. if I’m correctly interpreting why the gene above was filtered out, and that it was post-test, but pre-BHM-FDR correction).