edgeR "Filter lowly expressed genes" bug

I performed edgeR with this feature turned on and it truncated my results. I was missing about the last 1/3 of my genes. On a hunch I turned this off and the results included all of the genes. I can provide specifics if needed, but you may want to look into the wrapper code around this feature since I don’t think it’s performing what it’s supposed to. Note: all parameters were the same except for this one between runs.

What do you mean by truncated? The tool is expected to remove genes if “Filter lowly expressed genes” is selected. The number of genes filtered and total genes is shown in the report under “Additional Information”. The number of genes remaining after filtering is the total of the 3 numbers shown in the “Differential Expression Counts” table in the report and should match the number of rows in the output table (minus 1 for the header row) e.g. see below. Is that not what you’re seeing?

Note that the threshold for filtering lowly expressed genes should be selected based on the numbers of reads and samples e.g. see the “Filtering to remove low counts” section in the edgeR workflow article here: https://f1000research.com/articles/5-1438/v2

Right so it wasn’t that I was just missing a certain number of genes, it was that it seemed to be cutoff from a certain gene. So say I have 10,000 genes, in my count tables they were all there, but then the edgeR output only showed 1-9,800. it wasn’t that I was just looking at number of rows, it’s that for some reason it wasn’t giving me any output on the last 200 genes (these are example numbers, the real numbers are messier but I could send you the two outputs for you to look at if you’d like). I went back and checked the gene counts to see if maybe those last 200 were just lowly expressed and they definitely weren’t. Some of them were, but many had high count numbers and should have been included so for some reason the tool seems to have just made a cutoff point of genes, almost like an upper limit had been exceeded.

Hmm it would be good to see the inputs and the parameters aswell as the outputs to see what’s going on. Where are you running your analysis? Is it in a public Galaxy? @jennaj could I get access to this users data without them having to share it publicly if they’d prefer not to do that?

Yes I’d rather not publish it publicly but they’re jobs 1728-1730 with the filter on and 1735-1737 with the filter off. And it’s on the public galaxy.

Sierra Baney

I’ll take a look if the email used to register here is the same as your Galaxy account at Galaxy Main https://usegalaxy.org. Checking now. Assuming it would be Ok to share this with @mblue9 (privately) for troubleshooting but let us know if not.

@sdbaney The other options include

  • send @mblue9 a direct private message here in Galaxy Help that includes a shared history link (shared “by link”, not “published”). The share link can be from most Galaxy servers. If you are not sure how to send a direct message to another user yet, go to your Profile > Messages and ask @discobot for the how-to (will walk you through a short tutorial).

  • if you use a different email at Galaxy Main versus here, you can send me a direct message here that includes the registered Galaxy Main email address and I can look for the history and share it with @mblue9.

  • if working at Galaxy Main, you could also send in a shared history link in an email to galaxy-bugs@lists.galaxyproject.org and include the dataset numbers and the link to this post so we can associate the two. I can forward this to @mblue9 via email.

One of those should work for this.

Hi! To find out what I can do, say @discobot display help.

^^^ Do that in your own messages in a private thread, instead of posting all the info back here in the Q&A. It is good to learn this – the help/tutorials are interactive and very short :slight_smile:

Hi - I found the history and shared a copy with @mblue9. Let’s let her review, I don’t see anything odd from my quick look. Thanks!