AnnotatemyID warning and warning reappears as fatal error during limma DE

I’m very new to using galaxy or doing bulk rna analysis and I am just following the workflow mentioned in 1: RNA-Seq reads to counts, and 2: RNA-seq counts to genes

When I annotate my count matrix using AnnotatemyID, it runs successfully and gives me an output but has this error (mentioned below). I have the same number of lines 27180 in both count matrix and annotated data.

Galaxy Tool ID toolshed.g2.bx.psu.edu/repos/iuc/annotatemyids/annotatemyids/3.18.0+galaxy0
Job State ok
Command Line Rscript ‘/corral4/main/jobs/056/837/56837774/configs/tmpq6bgsh6b’
Tool Standard Output empty
Tool Standard Error Warning message: In Sys.setlocale(“LC_MESSAGES”, “en_US.UTF-8”) : OS reports request to set locale to “en_US.UTF-8” cannot be honored ‘select()’ returned 1:1 mapping between keys and columns
Tool Exit Code 0
Job API ID bbd44e69cb8906b56125df1b1268dae6

After this when I run limma for differential analysis, it fails.
This is what it says:
Warning message:
In Sys.setlocale(“LC_MESSAGES”, “en_US.UTF-8”) :
OS reports request to set locale to “en_US.UTF-8” cannot be honored
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 2 elements
Calls: read.table → scan

I am not able to figure out what is going wrong??

History of the dataset → Galaxy

Hi @Shivani_Mandal

I’m not able to see the shared history. If you still need help, try sharing it again. This is how → How to get faster help with your question

You could also compare your inputs to this guide. This is what we would be checking for first, too. → FAQ: Extended Help for Differential Expression Analysis Tools

For a quick guess, this error message

means that the tool ran into a line that only had one column of data, when it was expecting two.

That could be a header line that doesn’t have two columns. If you are including headers for count files, that will usually be a header that is something like this

GeneID SampleN

where SampleN is a unique name across all the count files input for a run.

The other issue could be a truncated file, but it sounds like you have already confirmed that all count files have the same number of lines (scientifically, this means that all sample counts were generated against the same set of GeneIDs).

Let us know if you solved this, and what the root issue was :slight_smile: If you can explain it quickly that would help others that may find this topic later on. Or, share the history again and we can try to help.