Hi there.
I am new to PCA plot analysis.
I have this graph, which is showing quiet high PC1 and comparatively low PC2. Also there are two clusters for the control samples.
So, my question is, is my dataset bad or good? Can I accept my DE gene list produced along with the plot through DeSeq2?
Also, I have gone through tutorials online but I still would like somebody to tell me the rules of accepting or rejecting a dataset based on PCA plot, once and for all.
Hi @Sanjukta_Ghosh
This topic is rather related to data interpretation, not Galaxy. You need an opinion from a statistician. Maybe check help for limma/limma-voom, edgeR or DESeq2. This GTN tutorial covers PCA plots:
While I am not qualified for this question, I can comment on couple aspects. The 1st component explains 81% of variability, and the samples display a reasonable separation on 1st component. The values are very big, eg -15 to 10 for component 1, which I found somewhat unusual.
Check the control samples. Do you see any obvious differences, like data were obtained in different runs, samples treated or processed separately or something else?
No Igor, I could not find any differences as you mentioned, infact the background paper does not provide any such information that can help figure out possible source of difference.