Problems with DESeq2

deseq2

#1

I am having issues with DESeq2. I get this error message

Fatal error: An undefined error occurred, please check your input carefully and contact your administrator.
Import genomic features from the file as a GRanges object … OK
Prepare the ‘metadata’ data frame … OK
Make the TxDb object … OK
‘select()’ returned 1:many mapping between keys and columns
Note: importing abundance.h5 is typically faster than abundance.tsv
reading in files with read.delim (install ‘readr’ package for speed up)
1 2 3 4 5 6
summarizing abundance
summarizing counts
summarizing length
using counts and average transcript lengths from tximport
Warning message:
In .get_cds_IDX(type, phase) :
The “phase” metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
estimating size factors
using ‘avgTxLength’ from assays(dds), correcting for library size
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
all gene-wise dispersion estimates are within 2 orders of magnitude
from the minimum value, and so the standard curve fitting techniques will not work.
One can instead use the gene-wise estimates as final estimates:
dds <- estimateDispersionsGeneEst(dds)
dispersions(dds) <- mcols(dds)$dispGeneEst
…then continue with testing using nbinomWaldTest or nbinomLRT
Calls: DESeq … estimateDispersions -> .local -> estimateDispersionsFit

I’m not exactly sure what to do to remedy this problem.


#2

Hello,

There is much discussion about this error coming up at the DESee2/Bioconductor forum if you want to explore prior Q&A solutions (warning: not all will be available in Galaxy). The error would come up whether run in Galaxy or not. Google will show the top hits for https://support.bioconductor.org/ with this search: “all gene-wise dispersion estimates are within 2 orders of magnitude”

In summary, this is a data content-related error. It indicates that there is a data mixup upstream (the same sample processed more than once by accident) or the given samples do not really have enough expression variance to do the calculation. Biological replicates are expected; technical replicates will have the same problem, e.g. insufficient expression differences.