Please note that I have already posted this message below since Oct. 20 at the
usegalaxy.org support, but so far I got no response. thus, I am reposting it here in case someone could help. Many thanks in advance.
Initial post in usegalaxy.org support:
I am using as input for the Volcano plot tool a file derived from DESeq2. The whole pipeline works great and I do get the corresponding Valcano graph but I have two little issues:
- Few genes are highly expressed and at the same time have p-value equal to 0; thus, these genes are depicted in the corresponding Volcano at the very top:
My question is: How should I treat those genes? I definitely want to have them due to are highly expressed, but at the same time, the plot doesn`t look proper in the way they are presented. I am not saying that is wrong, just saying that is odd. Should I exclude those genes? Should I do anything else with them?
- On some occasions the DESeq2 file that I am using as an input does show for LogFC, p-value, and p-adjust of a few genes the entry
NA, I guess standing for
Not Applicable, instead of any real value. Should I exclude these particular genes from the input that I am going to use for the Volcano?
I will highly appreciate any help,
A p-value of 0 should be filtered out for not being significant. Instead, it looks like a filter was not applied, or that tool/step had a problem interpreting the content of the values evaluated.
This tutorial has examples for creating meaningful plots: Visualization of RNA-Seq results with Volcano Plot
Many thx for your response; I appreciate it: )
Could you please explain to me why p-value=0 should be excluded?
As I understand it the p-value for the corresponding genes (gathering to the top of the graph) was very very very small and due to that the program (Volcano Plot) could not show it properly, thus gave zero. Moreover, correct me if I am wrong but, the smaller the p-value the more highly significant is the difference among the diff. samples of the experiment, which in turn means small variation among the different replicates of each sample type; in that sense please have a look just below at the values from the rLog-Normalized counts file that I got from the DESeq2 as part of the same analysis, for ALL of the genes appearing to the very top of my Volcano:
Could it be that due to the diff. replicates (for of the two sample typers Wt & Deletion) show extremely small variation among each other, and thus the p-value that the DESeq is calculating is zero? If that is the case then why should I exclude these particular genes, which is highly reproducible and homogeneous regarding their normalized counts?
Sorry for insisting, I just want to understand the issue that I have, I hope that you do understand
@ This tutorial has examples for creating meaningful plots: [Visualization of RNA-Seq results with Volcano Plot.
I did use this great tutorial to produce my Volcano indeed.
@Instead, it looks like a filter was not applied.
Filter was actually applied:.
P.S.: For your convenience and just in case I do attach the same Volcano with the gene of the top annotated:
I can reproduce your graph clumping up near the top by faking extremely small p-values (e100, e200, e300, e400 … e900).
It seems this is a known limitation of R data representation - R largest/smallest representable numbers - Stack Overflow
And the Bioconductor tool authors are aware, example topic (warning, older posts!) P Value of 0? but you can also search there for how others have scientifically interpreted these tiny values and any current additional advice.
Galaxy history that uses the inputs and workflow from the tutorial, plus an adjusted input with “extreme-pvalues” for the test. Everything is tagged and I’ll leave it shared as a reference. Galaxy | Accessible History | test rna-seq-viz-with-volcanoplot
Hope that helps!
It looks like you are losing significant digits in the table (Excel?).
Tools can report all kinds of odd values when there is a problem. A pvalue that is actually just “0” means that something went wrong. In your case, this seems to be related to losing significant digits in intermediate applications and that should be addressed.
Try running these tools in Galaxy if you are not already, and see what results. You won’t lose significant digits.
Thank you very much: )
All the info/links that you sent me, I believe, will help me to figure out how to interpret my data somehow.
Best wishes for a great weekend.