Featurecounts files of collection not showing up for Deseq2 upload

Hello,
I am using the RNA-seq dataset (triplicates) to analyze the differentially expressed genes in bacterial systems . I could analyze until featurecounts stage without any problem. But when I used Deseq2 tool to upload the featurecounts results, I couldn’t find the featurecounts on the collection file for uploading. It is just showing the single replicate counts and not the collection file that has all three replicates. It would be great if soemone could help me with this.

Thank you
Jawahar

Hi Jawahar,
do you switch to collection input mode during setup of DESeq2 job? By default DESeq2 shows individual files.
Kind regards,
Igor

Hello Igor,
As I mentioned in the previous email, the data doesn’t have replicates.

Thank you
Jawahar

Hi Jawahar,
can you please share the history with me, so I can have a look. You can share history in the history menu and paste the link in reply. A brief description will help, something like: input files or collection.
I am confused: in the original post you mention a collection with three replicates, and now you say data has no replicates.
Kind regards,
Igor

Hello Igor,
Thank you very much for the reply and sorry about the confusion.

1. Replications: Actually, I am working on multiple publicly available RNA datasets from several sulfate-reducing bacteria for machine learning studies. I can grasp your clarification about replications and this problem is resolved. (Question resolved)

2. Hisat2 error:: I am working on a separate RNA seq-dataset from our wet-lab experiment with triplicates. Please find the link below for the history named “RNA seq Gr Day 60”. Here we have RNA seq data of Desulfovibrio alaskensis bacteria exposed to five different experimental conditions. Below are the GALAXY pipeline along with their file numbers. Also, as I mentioned in my previous email, I see that I have an issue with the HISAT2 file. Among the columns in the HISAT2 result, both MPOS and ISIZEcolumns had only zeros for all the genes. When I used the same GALAXY pipeline last month I had several numbers in these columns where I ended up with good results. I am skeptical that may HISAT2 BAM file has some issues and it is reflected in DESEQ2 error. (Question yet to be resolved)

Files 7, 14, 30, 37 and 44 - Raw reads made into a collection of triplicates

143 - Trimmomatic od data collection 7
150 - Trimmomatic od data collection 14
157 - Trimmomatic od data collection 30
164 - Trimmomatic od data collection 37
180 - Trimmomatic od data collection 44 (error)
187 - Reference genome (.fa)
188 - Reference genome (.gff3)
189 - HISAT2 of collection 143
196 - HISAT2 of collection 150
203 - HISAT2 of collection 157
210 - HISAT2 of collection 164
217/218 - Featurecounts for collection 189
231/232 - Featurecounts for collection 196
245/246 - Featurecounts for collection 203

259/260 - Featurecounts for collection 210

273/274 - Deseq 2 on paired-end reads of data 219 (as the collection was showing up while uploading data). Please find attached figures of screenshots.

3. Deseq2 error?: Later I released that I may be running paired-end instead of collections as the collection was not showing up when I upload for Deseq2. Please find attached screenshots of the mouse pointer at Collection and Multiple datasets. (Question yet to be resolved)
I believe problems 2 and 3 might be related. It would be great if you could clarify this.

4. QC and Trimmomatic error for a replicate among triplicates: QC and Trimmomatic showed error for one replicate (the third one) among the triplicate. The raw data were in file 44, the third replicate. Files 129, 130 (QC) and 180 (trimmomatic) were the error files and I didn’t process this sample to the next stage. (Question yet to be resolved)

https://usegalaxy.org/history/view_multiple

Thank you very much and please let me know for more information. My research was stuck because of this issue for the past two weeks. It would be great if you could help me resolve this.
Jawahar

Hello Igor,
Also, I could see the versions (attached) in my GALAXY. Could you please mention which version is before v1.22?

Thank you
Jawahar

Picture8.jpg

Hi Jawahar,
DESeq2 version: you can find this information in the Galaxy tool shed. Switch to any version of DESeq2 in Galaxy > During job setup step click on Option (small triangle next to Versions icon in the top right corner of the middle window) > find requirements for this version.

The link you posted is a generic link to View all histories. I can see all my histories on the server.

Zeros in MPOS and ISIZE columns: by any chance, do you use single end data?

#3 - sorry, I don’t understand the problem.

QC error: do you mean FastQC error? What kind of error message you got? Does it mention truncated file (premature EoF)? Check the fastq file(s) for integrity. If the files are plain text fastq, select the last four lines and check the last read(s) for integrity. If the reads are GZipped FASTQ, changed it to plain text FASTQ in Edit attributes (pencil icon) > Convert > select uncompress option. If it fails, the file is truncated.

Hope this helps.

Kind regards,

Igor

Hello Igor,
Sorry about the link. The original link is here (https://usegalaxy.org/root?tool_id=toolshed.g2.bx.psu.edu/repos/iuc/deseq2/deseq2/2.11.40.7+galaxy1). I hope now you can access all the data in the history. Here, I am unable to upload the Featurecounts dataset collection for Desq2 analysis as it is showing up in the folders? Could you please help me with the HISAT2 problem and the above Deseq2 problem?

Also, regarding the FASTQC error, it is telling Fatal error: Exit code 1 (FastQC returned non zero exit code)

Please suggest your thoughts.
Thank you very much

Jawahar

Hi Jawahar,
the link you posted is for a job setup, not the history. You can share history in the history menu (cog wheel icon at the top of the history panel) and paste the link in reply.
Kind regards,
Igor

Hello Igor,
Sorry again. Please find below the history link. Thank you

https://usegalaxy.org/u/jawahar/h/copy-of-rna-seq-gr-day-60

Jawahar

Hi Jawahar,
thank you for the shared history.
FastQC error on collection (dataset #129). Click on collection, click on the 3d sample, the last on, click on error icon of the failed job on reverse reads. The error log has the following:
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle of a fastq entry. Your file is probably truncated

Failed DESeq2 (dataset #273). I am sorry for repeating the answer but DESeq2 does not support analysis without replicates since v1.22. This job was submitted using the latest DESeq2 version and no replicates.

No MPOS and ISIZE. You submitted these jobs with single end reads, two jobs per sample. You have two alignments for every sample, one for forward and one for reverse reads. During HiSAT2 job setup select paired-end option.

Hope this helps.

Kind regards,

Igor

1 Like

Hello Igor,
Thank you very much for your detailed replies. I see that for Deseq2 file upload, Galaxy 2.1.8.3 was the oldest version. Please suggest would it work for performing Deseq2 without replicate data.

Thank you very much for your great help.

regards
Jawahar

It should. You can check the DESeq2 version in the tool shed through Options (the icon next to Tool Version icon) > See in Tool Shed. For example, Galaxy version deseq2/2.11.39 uses bioconductor-deseq2 1.14.1, so the early versions all use even earlier version of the software.
Kind regards,
Igor

Hello Igor,
Thank you very much for the help. I was able to fix the HISAT2 errors and reached the Deseq2. The tools are working well for replicates. But for the sample without replication, I ended up with an error in Deseq2. Although it turned green, the Deseq2 result file is empty (data 71 and 72). I used Galaxy Version 2.1.8.3. Below is the link for the data. It would be great if you could help me fix this.

https://usegalaxy.eu/u/jawahar/h/dvh-cuo

Thank you very much for your great help.
Jawahar

Hi Jawahar,
the standard error file has the following:
/usr/local/tools/R/3.2.1/iuc/package_r_3_2_1/e46a7803f17b/lib/R/bin/exec/R: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

It is an old version, and it might be non-functional. Try other versions, for example 2.11.39. Some of these tools fails when a header line present in count tables. If it fails with error pointing to non-numeric value, remove the first line from count files.
Kind regards,
Igor

1 Like

Hello Igor,
Thank you very much for the information. I tried using all the versions 2.11.39 and below. As you suggested, it showed an error when I use 2.11.39 (below is the link for history). Could you please suggest to me how to remove the first row (line) from the header file?

https://usegalaxy.eu/u/jawahar/h/dvh-cuo

Thank you very much for your great help.
Regards
Jawahar

Hi Jawahar,
count tables are tab separated text files. You can use any appropriate tool in Text Manipulation section.
Kind regards,
Igor

Hello Igor,
It worked. Thank you very much for your great help.

Regards
Jawahar

Hi Jawahar,
you can unshare the histories, so no one can access the data.
This can be done in History menu (cog wheel icon) > Share or Publish > disable sharing.
Kind regards,
Igor

Hello Igor,
Sure, I just unshared the history. Thank you

Jawahar