Trinity ended; no assembly

Hello,

I am doing a de novo assembly with Trinity. It was running for 14 days and then ended with no error (the job box is green), but both the assembly and the gene to transcripts map are empty (0 bytes).

The only “error” I see in the output is this:
Sequence: CCAATATTAAGCCACGATTAAGGAis smaller than 25 base pairs, skipping
repeating for a bunch of sequences, and nothing else.

The input files are paired-end, RF stranded, Trimmomatic output files, not interleaved. I used them for assembly with Trinity in the same history almost a year ago, and everything worked perfectly.

Does anyone have an idea as to what could be causing this?
Thanks!

Hi @magdalenagrgic

This message means that sequences were skipped during assembly. You might want to rerun Trimmomatic and add in a length filtering criteria (require minimum read length of 25 bases) to pass through. That reduces the pre-processes that Trinity needs to run during execution.

If this was a rerun started by loading up the prior tool form (via the “rerun” icon), check to see if you are using the most current version of the Trinity tool available on the server where you are working (UseGalaxy.eu?). “Versions” can be navigated from the upper right corner of any tool form. Older versions may no longer be supported.

Try this:

  1. Rerun Trimmomatic to filtered reads >= 25 bases
  2. Rerun Trinity twice – using the original and the most current tool version
  3. If both of those fail, please send in a bug report from the error(s) to the server admins
  4. At the same time, you can follow up here by posting back a #sharing-your-history link. That link can be posted back publicly or ask for a moderator to start up a private chat. We’ll need all inputs and outputs to be undeleted and for you to note which dataset numbers are involved.

The job could really be too large to complete as-is but filtering the read sequences by length is one item to address that would have no scientific impact on the results. Other solutions would probably involve one or more of these: downsampling the reads (Seqtk tools), including a reference annotation (“genome guided mode”), not pooling samples, and/or tuning more of the assembly parameters.

Let’s start there :slight_smile:

Hi Jenna :slight_smile: ,

thank you very much for the fast response!

I may have expressed myself a bit awkwardly - it was not a rerun of the same job. What I wanted to say is that these exact datasets were previously used with the Trinity as well so I don’t think there’s something wrong with them. The only difference in this run is that I put all of my samples for one assembly instead of separating them into two different assemblies like the last time. That comes up to around of 500M reads, so it could be, as you say, that the job was too large.

Thank you for all the suggestions, I’ll try them.
Could you please also tell me how to start a private chat to share my history?

Magdalena

Hi @magdalenagrgic

Thanks for explaining. Combining more reads into a single assembly can definitely impact runtime and performance.

500M reads in the same assembly is about 10 times the size of datasets the authors used for benchmarking statistics: https://trinityrnaseq.github.io/performance/mem.html. Assembly can be impacted for scientific content reasons (depth, coverage, redundancy) – but I don’t think that is the root issue right now. The issue is simply volume versus resources.

What I would suggest is to consider downsampling the reads before assembly. One tool choice is Sektk. That way you can include all samples together but not run up against computational issues. There is much discussion about assembly strategies at the different bioinformatics forums, including here: https://groups.google.com/g/trinityrnaseq-users.

I’ll start up a direct message. But … I don’t think that reviewing your history right now will change the advice. It sounds like the reads are in the correct technical format, can pass through QA steps, and will assemble when in a smaller batch.

At this forum, a new user’s trust level will increase with time and activity, and the option to start up a direct message will become available. The goal is to try to keep initial questions and general discussion/troubleshooting public. This helps us to build up the knowledge base and helps you to reach more potential helpers :slight_smile: