Datasets error and failure in DE analysis for mirna-seq

kmy.88 · March 5, 2025, 2:55am

Hi, I;m not sure where’s the problem, but I can’t seem to run differential expression analysis for miRNA-seq. I have a list of miRNAs with read counts of all samples in CSV format. Once uploaded, I opted for RNA-seq (Tool) and set up the factor level. However, this error kept on popping:

error in data.frame(sample = basename(filenames_in), filename = filenames_in, :
duplicate row.names: /corral4/main/objects/7/9/a/dataset_79a04117-72b3-451e-bf04-82f061015503.dat

Does anyone can help determine the problem? Can anyone tell where did it go wrong? This is my data and should I upload metadata as well?

jennaj · March 5, 2025, 7:06pm

Welcome @kmy.88

Thanks for sharing the error!

The part of the message with duplicate row.names is the important part. This can result from actual duplicated sample names, but also from file formatting issues.

Searching with the error message at this forum yields these hits:

Search results for 'duplicate row.names order:latest' - Galaxy Community Help
Good simple answer where the issue was the “gene” format (the first column in a matrix like yours) → Differential gene expression: DESeq2 error

That same search at the Bioconductor forum yields these hits.

Bioconductor Forum

Most of those are referring back to the formats. Remember these are R tools, so any extra whitespace in identifiers or headers can cause problems. Identifiers are best interpreted when they “only include alphanumeric characters and optionally underscores, and not staring with a number”. Galaxy will clean this up a bit for you but it can’t do this perfectly so it is best to try to start off with “clean” data, especially if there is an error from the tool about format.

Right now, the first things that stick out to me in your screenshot is the space in the first column header, and the use of dashes in your identifier names. I would remove the space, and swap out the dashes with an underscore. Then double check for unique sample names in the other header columns, and try the run again.

This message can show up for other reasons but this is where I would start. Remember to make changes in all files as needed since these tools are “matching up” common identifiers across files.

If you get stuck, you are welcome to share back your history for more feedback! This error can come up for other reasons, but without clean files, that can be hard to predict based on what you have shared so far.

See the sharing FAQ link in here → How to get faster help with your question

Let’s start there, and let us know if you solve this!

XRef example tutorial that includes tools commonly used for this analysis domain → Hands-on: Whole transcriptome analysis of Arabidopsis thaliana / Whole transcriptome analysis of Arabidopsis thaliana / Transcriptomics

Topic		Replies	Views
Sample error in running deseq2 usegalaxy.eu support transcriptomics , rna-seq , deseq2 , featurecounts	2	599	April 29, 2023
Differential gene expression: DESeq2 error usegalaxy.org support limma_voom	5	774	October 2, 2023
Problem in DESe2 with multiple factors usegalaxy.eu support transcriptomics , rna-seq , deseq2	3	225	October 24, 2023
GOseq Error--Duplicate Row names? usegalaxy.org support transcriptomics , goseq	3	577	March 8, 2023
DESeq2 duplicate row.names usegalaxy.org support	1	1004	February 3, 2020

Datasets error and failure in DE analysis for mirna-seq

Related topics