EdgeR row names error

Dan · December 14, 2022, 7:12pm

Hi all,
I am trying to use EdgeR for differential gene expression analysis. When I try to run EdgeR using htseq-count files for each group in the same factor as the input (two count files per group), I end up with the following error:

Warning message:
In Sys.setlocale(“LC_MESSAGES”, “en_US.UTF-8”) :
OS reports request to set locale to “en_US.UTF-8” cannot be honored
Error in .rowNamesDF<-(x, value = value) :
duplicate ‘row.names’ are not allowed
Calls: row.names<- → row.names<-.data.frame → .rowNamesDF<-
Warning message:
non-unique value when setting ‘row.names’: ‘X0’

I’m unsure how to troubleshoot this and have searched through here and was unable to find answers.

I’m sorry if this has been asked and I accidentally overlooked it!

Thanks for any/ all help!

Dan

jennaj · December 14, 2022, 9:32pm

Hi @Dan

This type of error message from any Bioconductor or R tool means that there is a labeling problem or design problem. The details usually have some clues. Those can be searched against this forum https://support.bioconductor.org/ to limit the results (since some parts can be produced by other tools based on R).

This message is stating that the tool ran into what it thinks was a “duplicated” label when assigning the row “names” in an intermediate processing file. More likely, it is actually reporting that a label it was looking for couldn’t be found (this is what the ‘X0’ is referencing).

Click into the “i” icon for a dataset to see what was used to produce it. This is an easier view to review tool inputs and parameters for potential problems.

Next, expand the input datasets in that view.

Group (sample) names must be unique between all included in the same analysis job, and match the headers of the count data inputs. What to fix depends a bit on how you are suppling the design matrix and count data.

If using Single Count Matrix mode: Do the count files supplied have headers containing sample names that exactly match the values supplied in each matrix? If not, the tool cannot match the data up, and you’ll need to add those in. Your exact error resulted in prior failures for this same situation and is my best guess so far.
If using Separate Count Files mode: Similar checks would apply but those sample labels are supplied on the tool form → Factor and Group and (optionally) Contrast

Give that a review, make adjustments, and see if you can resolve the missing or conflicting labels.

If you get stuck, please post back a #sharing-your-history link publicly as a reply, or ask for a moderator to start up a direct message chat to share it in. Please leave all inputs and outputs undeleted, and note the error dataset number with the problem.

Quote from the tool form help about how to format these data:

Labels (all sections)

NOTE: Please only use letters, numbers or underscores (case sensitive), and the first character must be a letter

There is a lot of Q&A at this forum about “format” issues used for labels – I added a few tags to your post that will find most of those. Or, better, just check yours and simplify or add in as needed. No extra spaces and using only the characters the tool knows how to read. Most tools are quite literal when interpreting keys/values.

Inputs

Counts Data:
The counts data can either be input as separate counts files (one sample per file) or a single count matrix (one sample per column). The rows correspond to genes, and columns correspond to the counts for the samples. Values must be tab separated, with the first row containing the sample/column labels and the first column containing the row/gene labels. The sample labels must start with a letter. Gene identifiers can be of any type but must be unique and not repeated within a counts file.

Example - Separate Count Files:

GeneID WT1

11287 1699

11298 1905

11302 6

11303 2099

11304 356

11305 2528

Example - Single Count Matrix:

GeneID WT1 WT2 WT3 Mut1 Mut2 Mut3

11287 1699 1528 1601 1463 1441 1495

11298 1905 1744 1834 1345 1291 1346

11302 6 8 7 5 6 5

11303 2099 1974 2100 1574 1519 1654

11304 356 312 337 361 397 346

11305 2528 2438 2493 1762 1942 2027

And, tutorials are available here with example data + methods + workflows.

Dan · December 15, 2022, 5:52pm

Hi jennaj,

Thanks for your advice! I was using individual HTseq-count files produced by the HTseq tool without modifying them in any way, so I’m unsure why it was giving me trouble. I ended up combining the files into a single count matrix and that fixed the problem.

Thanks again!
Dan

Topic		Replies	Views
GOseq Error--Duplicate Row names? usegalaxy.org support transcriptomics , goseq	3	576	March 8, 2023
Error running edgeR usegalaxy.org support transcriptomics , tool-help , edger	4	65	September 3, 2024
edgeR Error Message transcriptomics , edger	1	828	October 6, 2022
How to solve problems in edgeR? usegalaxy.eu support text-manipulation , troubleshooting , resources , tool-help , edger	2	39	March 13, 2025
Galaxy error in EdgeR and DEseq2 usegalaxy.org support transcriptomics	4	1209	September 19, 2019

GeneID	WT1	WT2	WT3	Mut1	Mut2	Mut3
11287	1699	1528	1601	1463	1441	1495
11298	1905	1744	1834	1345	1291	1346
11302	6	8	7	5	6	5
11303	2099	1974	2100	1574	1519	1654
11304	356	312	337	361	397	346
11305	2528	2438	2493	1762	1942	2027

EdgeR row names error

Related topics