EdgeR data input problem

hi all, i am using EdgeR on galaxy to normalize my data but it keeps giving different an error messages and i am thinking it is because i did not format my data well. here is the first few lines of one of my data file and i want you to kindly format this few lines for me so that i could reformat others myself. here are the gene ids and the count reads
ENSG00000000003 0
ENSG00000000005 0
ENSG00000000419 33
ENSG00000000457 31
ENSG00000000460 3
ENSG00000000938 2414
ENSG00000000971 0
ENSG00000001036 5
ENSG00000001084 5
ENSG00000001167 123
ENSG00000001460 0
ENSG00000001461 7
ENSG00000001497 5
ENSG00000001561 3

2 Likes

Hi @chris_chidiebere

EdgeR requires a header line in counts inputs. Maybe that is the problem? See the tool form help under Inputs > Counts Data to review the expected formatting.

You can use Featurecounts or HTseq-count in Galaxy to create counts that have a header or add one to your existing count data, before or after uploading it to Galaxy.

If you want to try to add in a header using Galaxy, see the tools in the group Text Manipulation. Often the simplest route is to Upload a file that only has the header, then “stack” it on top of your count dataset using Concatenate. Make sure the second value in the headers (sample name) are distinct between all of your count datasets or you’ll run into other problems.

The formatting requirements for many tools are often specific and covered on tool forms. So if that doesn’t resolve the error, and you cannot determine how to fix the data on your own, the Galaxy EU team can jump in to help troubleshoot this from the administrative side.

Thanks!

Hi,

I am having a similar problem with uploading data to edgeR. I have my read count files in excel and saved them as either .txt or .csv. When I upload them to Galaxy .txt files are missing headers and .csv files have two headers (ie. header names repeated in below, see pictures below). If I try to use these files in edgeR I get this error message:
"Warning message:
In data.frame(sampleID = samplenames, factors) :

  • row names were found from a short variable and have been discarded*
    Error in colSums(data$counts) : ‘x’ must be numeric"

Which file format should I upload my files in in order for Galaxy to recognise the headers correctly?

csvfileupload Textfileupload

1 Like

Hello,

The count data is not in the proper format. Please scroll down on the tool form for examples. The tool is expecting numbers (counts) in the second column – because gene names are there, the tool is failing.

Both datasets have a header inside the file. One just has that header recognized in a different way, probably from the csv to tabular conversion function. Either looks fine. But if the datatypes differ for some reason (autodetect guessed format incorrectly), use the tabular dataset with this tool.

Reformat the input and try again. Tools like Cut and Paste can remove/rearrange/merge columns of data.

From the edgeR tool form:

edger-count-input-format

FAQ: Extended Help for Differential Expression Analysis Tools

A post was split to a new topic: Input format problems with tools