An error occurred when I used Trinity to analysis my RNA-seq data (clean paired-end fq.gz files)

jennaj · June 12, 2023, 8:34pm

Review these items:

The FastQC report shows that your reads need 5’ trimming
The logs from Trinity are still showing that reads <25 bases also remain
You haven’t used fastp yet as @igor recommended. Trimmomatic can do this too. Both can address items 1 and 2. I think we have already shared the QA tutorial, but here is the link again: Quality Control
rnaSPAdes was successful, and is a better tool choice overall, but remember: not failing only means that nothing technically went wrong and doesn’t indicate a quality scientific result.
Start with the QA steps first. Leave the FastQC result from before and after fastp undeleted and in that same history.

igor · June 12, 2023, 10:52pm

Hi @luweidong
I got the history. Thank you! You can delete the link or keep it for some time, it is up to you.
I could not spot any any obvious issue. Nucleotide composition changes along the read length. Phred quality scores use just several values. I submitted couple test jobs. The jobs will take some time.
@jennaj I am not familiar with setup of the main server. What does this error means?
ocean/projects/mcb140028p/xcgalaxy/main/staging/50953944/.cvmfsexec/mountrepo: line 70: cd: /ocean/projects/mcb140028p/xcgalaxy/main/staging/50953944/.cvmfsexec/dist/cvmfs/cvmfs-config.cern.ch/etc/cvmfs: No such file or directory

Thank you!
Igor

jennaj · June 12, 2023, 11:54pm

Hi @igor

If a tool does not produce any output at all, that error can result from this specific tool.

I think the reads need to be 5’ trimmed, then length filtered. Trimmomatic with the right settings will do that, or fastp

If just getting any result is the goal, that is already achieved with the other tool. No idea how downstream tools will work on that output though. At ORG, rnaSPAdes is more robust and will “do something” with reads that fail Trinity. Eg: with insufficient QA. Specifically, a lot of very short reads clogs up the Trimmomatic assembly process and the tool dies due to limited runtime memory available. We won’t be allocating more, or at least not now.

luweidong · June 13, 2023, 1:54am

Dear @jennaj,

Before I sent my former history, I had already used the fastp tool to filter the data. In that case, when the output files of Fastp were used as the input files for trinity analysis, the results still display in red.
In order to simplify and fully display the entire process, I created a new history and uploaded two original sequencing data (forward and reverse reads) of a sample, used the fastqc tool to control the quality of the data before and after using fastp, and leaved these results in the history. According to the fastqc report and your suggestion, 5 'trim was performed with the fastp tool, and the first 10 bases of each read were deleted. I believe that the cleaned data should meet the analysis requirements of the Trinity tool. As a control, the rnaSPAdes tool was used to assemble transcript reads as well. Unfortunately, the result of Trinity analysis showed red color, while the rnaSPAdes result was blank (0 byte). I uploads the history of this analysis, as shown below:

Best wishes,

Weidong Lu

igor · June 13, 2023, 3:26am

Hi @jennaj
I also have impression that rnaSPAdes is more forgiving on reads compared to Trinity.
I have access only to Trinity job submitted on untrimmed reads. The job was submitted with digital normalization. This should reduce amount of data used for assembly, but I cannot check requested memory on ORG.
Biased nucleotide composition at 5" ends is common for illumina RNA-Seq data. I don’t know how much problem it causes.
i.

luweidong · June 13, 2023, 3:54am

Dear @igor @jennaj ,
I have an intuition that the reason why Trinity analysis results always report errors is that the Galaxy platform allocates insufficient memory for each file. Trinity needs huge memory compared with other tools. This understanding comes from a literature titled “De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers. GigaScience, 2019,8:1–16”.

Table 3 of this article was screenshot as follows:

luweidong · June 13, 2023, 7:53am

A very interesting thing. As shown in the history (16), the results of rnaSPAdes analysis had no sequence if each reads were 5’ trimmed for 10 bp with fastp while other paramters remains unchanged, but contained 25071 assemble sequences (21) if using the default parameters of the fastp.

igor · June 13, 2023, 7:55am

Hi @luweidong
it can be memory, but I don’t have admin access to ORG and cannot check it.
Old version of Trinity works on the untrimmed reads. You are after datasets #14 and #15 in this history:

You can import the history. I will delete it at some point.
Kind regards,
Igor

luweidong · June 13, 2023, 8:31am

Dear @igor ,
Thank you very much. I saw the datasets #14 and #15 in the history. How can I choose the older version of the trinity software in the Galaxy platform?
I am currently dividing both files of a sample into four small files to reduce the memory requirement for calculations and hope this works.

igor · June 13, 2023, 9:56am

Hi @luweidong
during job setup click at three blocks (versions) icon at the top right corner of the middle window and select any available version from the pull-down menu.
You can use assembled transcripts and transcript to genes map from history I shared.
You can copy datasets from one history to another using See histories side by side (in history menu)
Kind regards,
Igor

luweidong · June 13, 2023, 11:10am

Dear @igor
Thank you very much. I will try to assemble my RNA-seq reads with old version (ver. 2.9.1) of Trinity after the data were filtered by fastp.

Best wishes,
weidong lu

Asrix · June 14, 2023, 3:21pm

Hi,

I never did get my issue with Trinity Assembly resolved on Galaxy. I switch to using a local supercomputer instead.

Good luck!
Anna

luweidong · June 14, 2023, 11:03pm

Hi @Asrix

You can use the methods as @igor suggested, it works. I think that the function of Galaxy is very powerful, but needs us to be familiar with and explore further.

Best wishes,
Weidong Lu

Topic		Replies	Views
Generate gene to transcript map: WARNING output usegalaxy.eu support text-manipulation , troubleshooting , fastqsanger , salmon	8	2374	February 29, 2020
Trinity assembly failure exceeds-memory-error , tool-help , trinity	7	48	October 29, 2024
de novo assembly using Trinity in Galaxy assembly , transcriptomics , fastqsanger , quality-control	7	5883	February 25, 2020
Trinity tool error usegalaxy.eu support transcriptomics	3	592	August 29, 2022
Trinity run: Remote job server indicated a problem running or monitoring this job. usegalaxy.org support server-admin	3	1312	June 7, 2019

An error occurred when I used Trinity to analysis my RNA-seq data (clean paired-end fq.gz files)

Related topics