Transcriptomics troubleshooting

Hello @jennaj,

I am having the same problem. I am the admin for the Galaxy instance that we use at our institution. I imported the RNA-seq deg-analysis workflow shared by you earlier in an earlier post from Galaxy workflows. It ran without any errors and generated outputs for the example dataset. However, I had someone double-check the workflow, with data from NCBI. I have another workflow designed to get counts and it ran on this data without any errors. However, when they ran these counts on the above workflow, it gave an error at the goseq stage.

Using manually entered categories.
Error in `[.default`(summary(map), , 1) : incorrect number of dimensions
Calls: run_goseq ... goseq -> reversemapping -> [ -> [.table -> NextMethod

In the above answer, you indicated double-checking my inputs. I have downloaded and looked at it, but is there another way to double-check in Galaxy? The tag “input” didn’t give me many posts.

I compared the input files that are given to goseq with the test dataset and the input files in the case of the example dataset that worked earlier. It looks very similar. Is there anything else I could do? Is there a way I can share the history and objects within so someone can have a look?

Thank you so much!
Priyanka

Hi @Priyanka_Bhandary

For this kind of data – I usually look at the common keys between the inputs first. Computers are literal: Chr1, chr1, and 1 all mean something different.

Also: extra whitespace, empty values, empty blank trailing list, values with .N where N is a version being compared to values without the version, values that contain whitespace (check your fasta > lines) … and more.

If the basic formats are intact, more check usually involve:

  1. Genome assembly/build – use the exact same source for all inputs or expect problems. Or, you can get fancy and convert between builds – and that would be extra, upstream data prep steps (common manipulations, but not really “simple”).
  2. Chromosome names
  3. Gene names
  4. Transcript names
  5. Computed values – scientific notation versus not, and consistently used

The error you got is probably about missing/empty values, or values that are not matching up (not unique, or become not unique during processing/data reduction).

Columns of 0 values without a header describing the sample in a singleUniqueWord would be one example I’ve seen. Maybe look at the data immediately upstream from GOSEQ first? Can you run the tool directly and it works? Or also fails, the same as the workflow did?

Or, try what I usually do, instead check that the inputs make sense first, then review the workflow steps in order to make sure some stray setting isn’t causing the problem. Why do I back all the way up? Trying to diagnose technical problems based on scientific results is so much harder, and might be missed right up until the final data reduction. Or, might be missed entirely! Not all problems will fail a tool and instead just produce weird results.

Most of the Q&A at this forum involves some variation of the problems above. The solutions vary but are still similar, and the tools in this tutorial (more of a guide actually) can help to find content problems. Data Manipulation Olympics

After checking the above, and you are still stuck, try to reproduce the error with the smallest data possible at a public server. Then share the history back and we’ll take a look :slight_smile:
→ Troubleshooting errors

Thank you so much for the detailed answer. I figured out the problem! The issue was with selecting the gene names in the drop-down menu in the goseq tool. The gene annotation file was using gene symbols and the default for goseq in the workflow was the Ensembl gene IDs. This inconsistency caused the problem. It was really hard to pin the error down though because the R error message wasn’t helpful. Thank you so much @jennaj ! Your reply helped me think through what could be going wrong!

Thank you,
Priyanka Bhandary

1 Like