EdgeR row names error

Hi all,
I am trying to use EdgeR for differential gene expression analysis. When I try to run EdgeR using htseq-count files for each group in the same factor as the input (two count files per group), I end up with the following error:

Warning message:
In Sys.setlocale(“LC_MESSAGES”, “en_US.UTF-8”) :
OS reports request to set locale to “en_US.UTF-8” cannot be honored
Error in .rowNamesDF<-(x, value = value) :
duplicate ‘row.names’ are not allowed
Calls: row.names<- → row.names<-.data.frame → .rowNamesDF<-
Warning message:
non-unique value when setting ‘row.names’: ‘X0’

I’m unsure how to troubleshoot this and have searched through here and was unable to find answers.

I’m sorry if this has been asked and I accidentally overlooked it!

Thanks for any/ all help!

Dan

1 Like

Hi @Dan

This type of error message from any Bioconductor or R tool means that there is a labeling problem or design problem. The details usually have some clues. Those can be searched against this forum https://support.bioconductor.org/ to limit the results (since some parts can be produced by other tools based on R).

This message is stating that the tool ran into what it thinks was a “duplicated” label when assigning the row “names” in an intermediate processing file. More likely, it is actually reporting that a label it was looking for couldn’t be found (this is what the ‘X0’ is referencing).

Click into the “i” icon for a dataset to see what was used to produce it. This is an easier view to review tool inputs and parameters for potential problems.

Next, expand the input datasets in that view.

Group (sample) names must be unique between all included in the same analysis job, and match the headers of the count data inputs. What to fix depends a bit on how you are suppling the design matrix and count data.

  • If using Single Count Matrix mode: Do the count files supplied have headers containing sample names that exactly match the values supplied in each matrix? If not, the tool cannot match the data up, and you’ll need to add those in. Your exact error resulted in prior failures for this same situation and is my best guess so far.

  • If using Separate Count Files mode: Similar checks would apply but those sample labels are supplied on the tool form → Factor and Group and (optionally) Contrast

Give that a review, make adjustments, and see if you can resolve the missing or conflicting labels.

If you get stuck, please post back a #sharing-your-history link publicly as a reply, or ask for a moderator to start up a direct message chat to share it in. Please leave all inputs and outputs undeleted, and note the error dataset number with the problem.



Quote from the tool form help about how to format these data:

Labels (all sections)

NOTE: Please only use letters, numbers or underscores (case sensitive), and the first character must be a letter

There is a lot of Q&A at this forum about “format” issues used for labels – I added a few tags to your post that will find most of those. Or, better, just check yours and simplify or add in as needed. No extra spaces and using only the characters the tool knows how to read. Most tools are quite literal when interpreting keys/values.

Inputs

Counts Data:
The counts data can either be input as separate counts files (one sample per file) or a single count matrix (one sample per column). The rows correspond to genes, and columns correspond to the counts for the samples. Values must be tab separated, with the first row containing the sample/column labels and the first column containing the row/gene labels. The sample labels must start with a letter. Gene identifiers can be of any type but must be unique and not repeated within a counts file.

Example - Separate Count Files:

GeneID WT1
11287 1699
11298 1905
11302 6
11303 2099
11304 356
11305 2528

Example - Single Count Matrix:

GeneID WT1 WT2 WT3 Mut1 Mut2 Mut3
11287 1699 1528 1601 1463 1441 1495
11298 1905 1744 1834 1345 1291 1346
11302 6 8 7 5 6 5
11303 2099 1974 2100 1574 1519 1654
11304 356 312 337 361 397 346
11305 2528 2438 2493 1762 1942 2027

And, tutorials are available here with example data + methods + workflows.

Hi jennaj,

Thanks for your advice! I was using individual HTseq-count files produced by the HTseq tool without modifying them in any way, so I’m unsure why it was giving me trouble. I ended up combining the files into a single count matrix and that fixed the problem.

Thanks again!
Dan

1 Like