I am trying to do DGE for which i downloaded the data from SRA, ran HISAT2 on it and then used featureCounts (was getting empty gene count when using StringTie), however when I try to use the featureCount output for DeSeq2, I get the error -
Execution resulted in the following messages:
Fatal error: An undefined error occurred, please check your input carefully and contact your administrator.
Tool generated the following standard error:
Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length
Anytime an R tool creates an error like yours, it means the tool had trouble creating a “data frame” internal to the program (a data matrix with columns and rows).
In short, the different count values in different groupings are what the tool is calculating statistics about. When the tool cannot create that data structure, it will error.
The issue is usually with the labels input on the form, but can also be some problem with content inside of the count files.
Please give the help below a review and let us know if you solve this! Any trouble, you can post back a share link to your history and we’ll help to look closer to solve this. How to do that is in the banner at this forum, also here How to get faster help with your question
What to check:
Content input on the form
Make sure you are using unique labels on the form, and none are missing.
How to format the labels is on the forum, but since this is an R tool, using those same “rules” are a good idea: numbers, letters, and optionally underscores. Not starting with a number, and no spaces.
The Galaxy form will try to fix this for you … but any problems, simplify yourself since it is impossible to make this auto-fixing part perfect!
The other item to check is the count file content.
This tool is expecting just two columns of data in the count files.
First column: the list of genes and the second column: count per gene for that sample.
Also, the header needs to be in a specific format: GeneID (common for all files) and SampleID (unique for that sample). See the Help example on the form for what is expected (scroll down to find it). Most counting tools run in Galaxy produce the correct format automatically, but uploaded count data may not and you’ll need to adjust it.
Finally, double check that all count files were generated using the exact same reference data (GTF). That should result in all of the files having the same number of rows, because they all have the same listing of genes in the first column. The difference between samples are how they are 1) labeled/grouped, and the 2) count values, everything else is the same!
Tutorials with example data, step descriptions, and workflow templates can be found here (not just for DESeq2, but also Limma and EdgeR) Transcriptomics / Tutorial List
A filtered list of tutorials that specifically include DESeq2 are linked in the Help section of the tool form, that same link is also here Galaxy Training!
Thanks for the reply. I actually realized that instead of inputting a dataset collection, I should’ve given all the files as input individually. This made the error go away and I finally got my results.
Adding group tags is commonly done at the very start when loading up data, and is a somewhat advanced way to use Galaxy. The bonus is that it means you can process everything together as a batch for upstream steps, but still be able to split out the data when using a downstream tool. The tutorial aboves shows how this works with DESeq2 specifically, but this short guide covers the basics → FAQ: Adding a tag