Error with DESeq2 "every gene contains at least one zero"

Have you experienced this error in galaxy?:

Fatal error: An undefined error occurred, please check your input carefully and contact your administrator.
estimating size factors
Error in estimateSizeFactorsForMatrix(counts(object), locfunc = locfunc,  :
  every gene contains at least one zero, cannot compute log geometric means
Calls: DESeq ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix
1 Like

Every gene has at least one zero for at least one sample of your four samples. The thing is that DESeq2 calculates a geometric mean for every gene over every sample. The geometric mean will be zero if at least one entry is zero, i.e., if one of the four samples has a zero count it will be zero and cannot be used. The geometric means are later important for the DESeq2 normalisation.

Example:

  • ID S1 S2 S3 S4
  • Gene_1 1 22 3 0
  • Gene_2 30 0 12 5
  • Gene_3 0 0 1 6

All Genes above are invalid and cannot be used by DESeq2 because at least one entry is zero.

What you need is something like:

  • ID S1 S2 S3 S4
  • Gene_1 1 22 3 1
  • Gene_2 30 1 12 5
  • Gene_3 0 0 1 6

That would be fine since Gene_1 and Gene_2 can be used later for the normalisation. Gene_3 will be filtered out by DESeq2.

Solution:

  1. You include more data
  2. You add a pseudo-count of 1 to all counts. Just add 1 to every entry. Thats technically not cheating because you treat every entry and DESeq takes the log of the data anyway. If you do then make sure you remove genes beforehand that have zeros for all samples, e.g., Remove Gene_1 in this example:
  • ID S1 S2 S3 S4
  • Gene_1 0 0 0 0
  • Gene_2 30 0 12 5
  • Gene_3 0 0 1 6
  1. Galaxy (DESeq) has an option for that error as well. Look under (Optional) Method for estimateSizeFactors. These methods are more sophisticated. Choose either ‘poscounts’ or ‘iterate’. Please read the documentation in the tool description for both these models, before you use one of them.
2 Likes