Error when running DeSeq2: help resources and input strategies

I am trying to do DGE for which i downloaded the data from SRA, ran HISAT2 on it and then used featureCounts (was getting empty gene count when using StringTie), however when I try to use the featureCount output for DeSeq2, I get the error -

Execution resulted in the following messages:

            Fatal error: An undefined error occurred, please check your input carefully and contact your administrator.

Tool generated the following standard error:

        Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

Calls: rownames<- … row.names<- → row.names<-.data.frame → .rowNamesDF<-

This is my first time doing RNA Seq analysis, so please bear with me. I would be grateful if anyone could help me with how to proceed further.

Welcome, @Ketan_Chandra

Anytime an R tool creates an error like yours, it means the tool had trouble creating a “data frame” internal to the program (a data matrix with columns and rows).

In short, the different count values in different groupings are what the tool is calculating statistics about. When the tool cannot create that data structure, it will error.

The issue is usually with the labels input on the form, but can also be some problem with content inside of the count files.

Please give the help below a review and let us know if you solve this! Any trouble, you can post back a share link to your history and we’ll help to look closer to solve this. How to do that is in the banner at this forum, also here How to get faster help with your question

What to check:

  1. Content input on the form

    Make sure you are using unique labels on the form, and none are missing.

    How to format the labels is on the forum, but since this is an R tool, using those same “rules” are a good idea: numbers, letters, and optionally underscores. Not starting with a number, and no spaces.

    The Galaxy form will try to fix this for you … but any problems, simplify yourself since it is impossible to make this auto-fixing part perfect!

  2. The other item to check is the count file content.

    This tool is expecting just two columns of data in the count files.

    First column: the list of genes and the second column: count per gene for that sample.

    Also, the header needs to be in a specific format: GeneID (common for all files) and SampleID (unique for that sample). See the Help example on the form for what is expected (scroll down to find it). Most counting tools run in Galaxy produce the correct format automatically, but uploaded count data may not and you’ll need to adjust it.

    Finally, double check that all count files were generated using the exact same reference data (GTF). That should result in all of the files having the same number of rows, because they all have the same listing of genes in the first column. The difference between samples are how they are 1) labeled/grouped, and the 2) count values, everything else is the same!

  3. Tutorials with example data, step descriptions, and workflow templates can be found here (not just for DESeq2, but also Limma and EdgeR) Transcriptomics / Tutorial List

  4. A filtered list of tutorials that specifically include DESeq2 are linked in the Help section of the tool form, that same link is also here Galaxy Training!

  5. More help about common problem solving help for these tools can be found here FAQ: Extended Help for Differential Expression Analysis Tools and at this forum under the tag transcriptomics or you can search by the tool name or parts of the error message.

Let’s start there, and you can ask more questions! :slight_smile:

Thanks for the reply. I actually realized that instead of inputting a dataset collection, I should’ve given all the files as input individually. This made the error go away and I finally got my results.

1 Like

Hi @Ketan_Chandra Glad you have this working!

For others reading:

This is the toggle on the form to set which way you want to organize the inputs. Toggle your choice to fit the data you have.

Select datasets per level

  1. individual count files (separate files)

  2. multiple collections of count files (one collection per factor level)

Select group tags corresponding to levels

  1. all the count files together in a single collection that is annotated with “group tags”.

Adding group tags is commonly done at the very start when loading up data, and is a somewhat advanced way to use Galaxy. The bonus is that it means you can process everything together as a batch for upstream steps, but still be able to split out the data when using a downstream tool. The tutorial aboves shows how this works with DESeq2 specifically, but this short guide covers the basics → FAQ: Adding a tag

Many ways to do the same things! :hammer_and_wrench: