DESEQ2 analysis with galaxy

Hello everyone,

I’m currently in the process of performing DEGS idenitification with DESEQ2 in the galaxy pipeline.

I performed all my steps with Galaxy.

I performed whole RNA transcriptome sequencing. The obtained FASTq files I imported into galaxy and then I performed the following steps:

  1. FASTP
  2. FASTQC
  3. RNA-STAR
  4. feature counts

AS next step I wanted to perform DESEQ2 analysis to identify DEGS between the two groups of samples.
I input the feature counts file into the DESEq2 pipeline and I started to perform the analysis.
After 10s I get the following error: “Error in data.frame(sample = basename(filenames_in), filename = filenames_in, : duplicate row.names: /data/dnb06/galaxy_db/files/f/b/0/dataset_fb0b5307-b5a9-4a57-ba8f-a3077494c305.dat”.

However, I don’t have any duplicates. I have 58051 rows of individual genes without any duplicates.

Has anybody encountered the same issue?

Thank you very much for your help!

Hi @cannico
58k is somewhat high for gene names. Do you use transcripts?
Do you use individual count tables per sample or merged table? Do you have spaces in the sample names? Make sure the sample names and conditions start with a character, not a number and no space characters.
If you still have an issue, I can have a look. What Galaxy server do you use? Are you OK with the history sharing using URL?
Kind regards,
Igor

Hi @igor

thank you for your reply. Yes, I use Ensemble IDs as input. I will attach a screenshot as an example file.

For the feature counts I used individual files for each sample. So I have 25 total samples, 15 are the positive group and 10 are the negative group. That is how all files are named: featureCounts_SF-13-516_positive. I use the galaxy europe server.

.

I hope this helps,

Nico

Hi @cannico
by any chance, have you selected the same sample(s) into control and experiment conditions/group?
I completed featureCounst and DESeq2 using ENSEMBL gene annotation in Galaxy Europe, so it looks like something in the data or the job setup.
Kind regards,
Igor

Hi Igor,

thank you for your reply again. I checked your suggestion and they are definitely different samples.

I did another check and I only used 3 samples per group and it now it gives me the following error: Warning message:
In a$V1 == l[[1]]$V1 :
longer object length is not a multiple of shorter object length.

After looking up that error, it means that the vectors have different lengths. I can’t explain myself that error, since every file has 58,051 lines and also galaxy displays that per file. Do you have any other ideas?
Nico

Hi @cannico
if you share the history, I can have a look. Maybe copy the last failed job and the input files into a new history and share it. Datasets can be copied in multiple histories view mode. History sharing is in History menu (icon at the top right corner of the history panel) > Share or publish. You can paste the URL here.
Kind regards,
Igor

Hi @igor

here is the history link.

Let me know if you need further information and thank you for your time.

Nico

Hi @cannico
sample #20 is specified in both conditions.
Note two black angles at the bottom right corner of sample selection window. Move the mouse of these two angles and drag it down to increase the window size. Do it for the both conditions and check samples used for the job.
Hope that helps.
If the issue is resolved, unshare/unpublish the history.
Kind regards,
Igor

1 Like

Hi @igor,

thank you for your help! I fixed the error and replaced it with the correct file name. Now after i RERUN DESEq2 with the correct values, I get another error:
Error in DESeqDataSet(se, design = design, ignoreRank) :
some values in assay are negative
Calls: get_deseq_dataset … DESeqDataSetFromHTSeqCount → DESeqDataSetFromMatrix → DESeqDataSet

Which is very interesting, because I checked and I don’t have any negative values.

Nico

Hi @cannico
DESeq2 job setup: Files have header = No, while the count files have headers.
Hope that helps.
Kind regards,
Igor

1 Like