Issue with edgeR results for endometrial dataset

Hi,

I’m facing an issue while running edgeR on one of my datasets (endometrial samples). The same pipeline and settings worked correctly for my colorectal dataset, but for this one I’m consistently getting flat results (0 upregulated and 0 downregulated genes).

I’ve rechecked the input count matrix and rerun the analysis multiple times, but the issue persists. Could you please help me check if there might be a problem with how the data is being interpreted or processed on Galaxy?

Thanks in advance.

Best,
Monish

PFA the history link of the workflow

Hello @Monish_V

I don’t see an obvious problem with the technical steps we can see, but to explore this scientifically, you could compare to a pipeline that performs the mapping and counting as a distinct steps! This may help to tune the parameters more with RNA-Star. Below are some Galaxy resources we can suggest for the exploration!


This workflow from the IWC workflow library is one choice. It runs QA through counting. These counts can be input into any of the Bioconductor differential expression tools!

Then, as an example, this workflow consumes the output of the first and generates statistics with DESeq2.

If you didn’t perform QA on the reads first, that is one area to back up to try to improve the mapping/feature capture results with your current pipeline. Removing untrimmed adaptor may be more important for some samples?

Or, you can try a template workflow like this one (or, see the fastp step included in the first counting workflow above!).

I would also suggest confirming the strand for these reads! This exploration can inform about upstream data preparation steps, too. The workflows from this intro tutorial can run some numbers for you.

And this step in another covers reviewing mapping in IGV.

Then this step in another explains more about how to review results from tools like Infer Experiment.


Overall, a workflow would likely be a benefit to ensure you are running the exact same steps and parameters between your two groups of samples. This could be extracted from one of the histories, then tuned up to ensure it has the correct Inputs and connections, then run on both as a batch?

Hope this helps! :slight_smile:

Hello @jennaj

Apologies for the late reply
I reran the program with a significant change by changing the P-value adjusted threshold to 0.2 and it did provide results

https://usegalaxy.eu/api/datasets/26c75dcccb616ac886478b60fff9df0e/display?to_ext=html

https://usegalaxy.eu/api/datasets/26c75dcccb616ac838e14aaf8f33ac87/display?to_ext=tabular

upon inspection of the results , I was able to find some biological significance in relation to my research
However it would be greatful to know in your expert opinion whether if it was the appropriate thing to do considering this threshold provides higher rate of false positives ?If not, I shall then try the alternative workflows you’ve provided

I’d also like to inform , I’ve removed some outlier samples which showed up on the BCV Plot

Thank you
Kind regards
Monish

Hi @Monish_V

I would suggest that you consult with a statistician for feedback about scientific interpretation of results, as this is a beyond the scope of what we can help with here! The tool packages we host inside of Galaxy are the original base tools most are familiar with and all of the versioning will be captured if that is needed.

It is good to know that you have worked out how to avoid the technical errors! That can be half the battle! :scientist: