Trinity align reads and estimate abundance Fatal Error

Hi, I’ve recently encountered an error when running the ‘align reads and estimate abundance’ tool in the Trinity de novo RNA-seq analysis pipeline.
The output is as follows:

Fatal error: Exit code 2 ()
CMD: touch /galaxy-repl/main/jobdir/028/998/28998407/working/input.fa.bowtie.started
CMD: bowtie-build /galaxy-repl/main/jobdir/028/998/28998407/working/input.fa /galaxy-repl/main/jobdir/028/998/28998407/working/input.fa.bowtie
CMD: touch /galaxy-repl/main/jobdir/028/998/28998407/working/input.fa.RSEM.rsem.prepped.started
CMD: rsem-prepare-reference --transcript-to-gene-map /galaxy-repl/main/jobdir/028/998/28998407/working/gene_to_trans.map /galaxy-repl/main/jobdir/028/998/28998407/working/input.fa /galaxy-repl/main/jobdir/028/998/28998407/working/input.fa.RSEM
$VAR1 = [
{
‘right’ => ‘/galaxy-repl/main/jobdir/028/998/28998407/working/paired_right.fq’,
‘left’ => ‘/galaxy-repl/main/jobdir/028/998/28998407/working/paired_left.fq’,
‘output_dir’ => ‘output’
}
];
CMD: set -o pipefail && bowtie -q --all --best --strata -m 300 --chunkmbs 512 -X 800 -S -p 6 /galaxy-repl/main/jobdir/028/998/28998407/working/input.fa.bowtie -1 /galaxy-repl/main/jobdir/028/998/28998407/working/paired_left.fq -2 /galaxy-repl/main/jobdir/028/998/28998407/working/paired_right.fq | samtools view -@ 6 -F 4 -S -b | samtools sort -@ 6 -n -o bowtie.bam
Warning: Skipping pair SRR5112684.25647332/1 HWI-D00525:102:C4NM2ANXX:2:2302:6220:36668 length=125/1 because a mate is less than 4 characters long

4NM2ANXX:2:2106:14360:26107 length=125/1 because a mate is less than 4 characters long

reads processed: 917268

reads with at least one reported alignment: 650697 (70.94%)

reads that failed to align: 266571 (29.06%)

Reported 1024386 paired-end alignments
[bam_sort_core] merging from 0 files and 6 in-memory blocks…
CMD: touch bowtie.bam.ok
CMD: convert-sam-for-rsem bowtie.bam bowtie.bam.for_rsem
Number of first and second mates in read SRR5112684.97/1’s full alignments (both mates are aligned) are not matched!
Error, cmd: convert-sam-for-rsem bowtie.bam bowtie.bam.for_rsem died with ret: 65280 at /cvmfs/main.galaxyproject.org/deps/_conda/envs/mulled-v1-3213810583b3c414a873752c2610a95351f8665124407340352879200d8f7bbf/bin/align_and_estimate_abundance.pl line 729.

I’ve found an answer to a similar question from the galaxy biostar page (https://biostar.galaxyproject.org/p/28961/) which suggested there were different numbers of reads between the pairs, but when I checked with FastQ Interlacer, there were no single reads (all reads were matched between F & R).

If anyone has any suggestions regarding this issue, that would be greatly appreciated.

Thanks

1 Like

Hello @h.mckay

Did you run any QA on these data?

Trimmomatic can eliminate the very short reads (reported as warnings by Trinity) and will result in 4 outputs – 2 datasets for forward + reverse that are still paired after QA and 2 for those that are not.

You may choose to do more with Trimmomatic (remove adaptor, etc) – run FastQC on your data to review original quality and decide. It is common to run FastQC both before and after Trimmomatic, to compare results/success of the QA filtering applied.

Once done, input the “still paired reads” to Trinity and see if that resolves the error.

Thanks!

Hi @jennaj

I did run FastQC both pre- and post-Trimmomatic on my data, which eliminated failures, but maintained a few warnings: per sequence GC content, sequence length distribution (a very small percentage were a different length than the rest, graph was a tight peak at ~115bp), and sequence duplication levels (28%).

I can’t seem to find if there are specific quality requirements for running Trinity, but there were no warnings in the trinity log (assembly ran successfully).

If you think I should make my quality cutoffs more stringent, please let me know.

Thanks!

1 Like

Filtering read by quality (20 == 99% base-calling accuracy) and length (remove very short sequences) will make the assembly cleaner and less likely to run into resource problems.

This means that the QA needs to be done on the reads used to generate the Trinity assembly and the reads mapped back against that assembly result.

This part of the error message seemed suspicious given that your reads are paired up, so I web searched and found that others ran into this problem when using the option RSEM with Bowtie. https://github.com/trinityrnaseq/trinityrnaseq/issues/305 Some resolved it by using RSem with Bowtie2 instead. Maybe try a rerun using the Bowtie2 option? You can toggle between Bowtie and Bowtie2 in the Galaxy wrapped version of the tool.

A highly fragmented Trinity assembly could also lead to this problem (or that is my guess – if the assembled contigs do not represent full transcripts, then reads could map to different contigs and then “not match”) – but it is definitely worth a try to see if Bowtie2 works first.

You may also want to review the assembly itself, and potentially filter it by length. Any contig that is shorter than your reads will not capture hits, but contigs that are only long enough to capture one end of the read pair can lead to a multi-mapping issue.

How long should a valid contig be? It depends on the species/genome – but any “contig” that is really short (under ~100 bases, or the length of the original reads) could represent data that wasn’t incorporated into a full-length transcript. That could be related to the original read quality, untrimmed artifact, or potentially the depth or coverage of sequencing was low. Even upstream issues with how the library was prepared could be a factor.

Meaning, it is possible that full length alternatively spliced contigs (transcripts) were not generated. This could lead to mapping problems … especially if the same reads were used to generate the assembly that were later mapped back to it (a “poor” quality read would map back against itself). The goal is to create a high-quality assembly that does not include partial/duplicated content.

Please review your original assembly reads, clean those up and reassemble as needed, then try Bowtie2 with reads that also were quality filtered, to eliminate read quality and technical factors, and we can follow up from there regarding assembly content as needed.

Note: Trinity is currently problematic at usegalaxy.org, but we expect to have that resolved as a priority fix. Related Q&A about that (Trinity runs on the same cluster as these other tools): SPADES - Remote job server indicated a problem running or monitoring this job.

Apologies for the confusion, but we can sort out the different issues, and doing more QA first is probably needed anyway.

Thanks!

Hi @jennaj

Thanks for the info, but unfortunately, switching Bowtie to Bowtie2 resulted in the same error. I was also confused about the ‘not matched!’ error, since I confirmed all my reads were matched with fastq interlacer.

How would I determine if my Trinity assembly is highly fragmented, and if it is, is there a way to fix it?

Thanks

1 Like

If your reads were not quality trimmed before assembly, then you should do that first.

Resources about the Trinity pipeline, include assembly statistics tools:

That said, I still think this could be a server-side issue. These tools were designed to work with data that isn’t “perfect”. You could also try alternative options than RSEMSalmon or Kalisto as others needed to do (see the ticket that the Trinity authors wrote back on https://github.com/trinityrnaseq/trinityrnaseq/issues/305).

But if this is an actual cluster issue, those wouldn’t work either. In testing today, many tools that run on the same cluster as this one all failed. Our administrator is working on it.