Trinity error in usegalaxy.org site

I have rna-seq data sets which after aligning through HiSat2 I identified unaligned reads.

The unaligned reads as output of HiSat2 was given as unaligned reads and the following error was noted.
Could anyone help me to sort this error

---------------------------------------------------------------------------------- -------------- Trinity Phase 1: Clustering of RNA-Seq Reads --------------------- ---------------------------------------------------------------------------------- Converting input files. (in parallel)Thursday, April 4, 2019: 02:26:37 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384659.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /1 --to-fasta >> left.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384659.dat.readcount Thursday, April 4, 2019: 02:26:37 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384661.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /2 --to-fasta >> right.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384661.dat.readcount Thread 2 terminated abnormally: Error, counts of reads in FQ: 2061214.5 (as per cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384661.dat | wc -l) doesn’t match fastool’s report of FA records: 0 at /opt/packages/trinity/2.2.0/Trinity line 3087 thread 2. main::ensure_complete_FQtoFA_conversion(‘cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/da…’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/datase…’) called at /opt/packages/trinity/2.2.0/Trinity line 2116 thread 2 main::prep_seqs(‘ARRAY(0x2857f00)’, ‘fq’, ‘right’, undef) called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2 eval {…} called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2 Thread 1 terminated abnormally: Error, counts of reads in FQ: 2086075.75 (as per cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384659.dat | wc -l) doesn’t match fastool’s report of FA records: 0 at /opt/packages/trinity/2.2.0/Trinity line 3087 thread 1. main::ensure_complete_FQtoFA_conversion(‘cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/da…’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/datase…’) called at /opt/packages/trinity/2.2.0/Trinity line 2116 thread 1 main::prep_seqs(‘ARRAY(0x26d84a0)’, ‘fq’, ‘left’, undef) called at /opt/packages/trinity/2.2.0/Trinity line 1314 thread 1 eval {…} called at /opt/packages/trinity/2.2.0/Trinity line 1314 thread 1 Trinity run failed. Must investigate error above.

1 Like

Welcome, @AISWARYA_JAYAPRAKASH!

Trinity can be run in paired end mode. This requires two distinct inputs, one dataset for the forward/left reads and one dataset for the reverse/right reads. The same base sequence identifiers must be included in both inputs and these inputs cannot be the same dataset.

Trinity can also be run in single end mode. This requires a single input, one dataset containing all reads. This should not be an “interlaced” fastq dataset – it should contain forward reads, or reverse reads, but not both.

Also, the fastq input(s) need to represent complete records for Trinity.

HISAT2 can handle incomplete fastq records (perhaps truncated during Upload?). It discards any incomplete records to the unmapped output.

Check your run to see where the problem was introduced versus the above usage help. The error message suggests a combination of problems.

Dear Jennifer

My RNA_seq datasets are paired end and I have separate fastqsanger files for forward and reverse strands

After Hisat2 alignment, the unaligned reads are also obtained as separate files for forward and reverse strands in fastqsanger file format.

By doing paired end assembly in trinity, its again giving me same error


Trinity Phase 1: Clustering of RNA-Seq Reads


Converting input files. (in parallel)Wednesday, April 10, 2019: 02:28:39 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /1 --to-fasta >> left.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat.readcount
Wednesday, April 10, 2019: 02:28:39 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384640.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /2 --to-fasta >> right.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384640.dat.readcount

Thread 1 terminated abnormally: Error, cmd: cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /1 --to-fasta >> left.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat.readcount died with ret 256 at /opt/packages/trinity/2.2.0/Trinity line 2206.

Thread 2 terminated abnormally: Error, counts of reads in FQ: 2343568.75 (as per cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384640.dat | wc -l) doesn’t match fastool’s report of FA records: 0 at /opt/packages/trinity/2.2.0/Trinity line 3087 thread 2. main::ensure_complete_FQtoFA_conversion(‘cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/da…’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/datase…’) called at /opt/packages/trinity/2.2.0/Trinity line 2116 thread 2 main::prep_seqs(‘ARRAY(0x28a75f8)’, ‘fq’, ‘right’, undef) called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2 eval {…} called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2

Trinity run failed. Must investigate error above.

1 Like

Thanks for explaining more.

The problem is likely that the two unpaired read inputs do not contain the same base sequences anymore (or may have not originally when mapped). HISAT2 does not require matched up inputs, but Trinity does.

Try running the post-HISAT2 unmapped datasets through these two tools to generate matched up inputs appropriate for Trinity. You’ll probably lose some reads if the forward or reverse for any are not both present in the “unmapped” output, and those extra reads are what is most likely causing the error (read counts between both inputs are not the same).

Tool Group: FASTA/FASTQ

  • FASTQ interlacer on paired end reads

  • FASTQ de-interlacer on paired end reads

Trimmomatic will also create matched paired-end results but might apply unnecessary read QA at this point in the analysis. If you didn’t do QA first (before mapping), consider doing it, both mapping and assembly tend to work better with cleaned-up reads (adaptor removed, etc).

Whether or not you already did or decide now to do QA before mapping, the “unmapped” results from HISAT2 will still need to have the reads paired up again before inputting the data to Trinity.

Thanks and let us know how using matched pairs works out.

Dear Jennifer,

I have down the QA of the reads before running hisat2 and the fastqc report showed no adapter contamination and good per base sequence quality and the following warnings.

            warning (Orange) : Per tile sequence quality and Sequence duplicaiton level.

Fail (Red) : Per base sequence content.

I tried running FASTQ interlacer on paired end reads on my Hisat2 unaligned reads. The job got terminated showing following error. Kindly help me to resolve this issue

  Traceback (most recent call last):
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/bin/gx-fastq-paired-end-interlacer](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/bin/gx-fastq-paired-end-interlacer)    ", line 6, in <module>
sys.exit(galaxy_utils.sequence.scripts.fastq_paired_end_interlacer.main())
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/scripts/fastq_paired_end_interlacer.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/scripts/fastq_paired_end_interlacer.py)    ", line 36, in main
for i, mate1 in enumerate(fastqReader(path=mate1_filename, format=type)):
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py)    ", line 616, in __iter__
yield next(self)
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/six.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/six.py)    ", line 564, in next
return type(self).__next__(self)
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py)    ", line 592, in __next__
assert fastq_header.startswith('@'), 'Invalid fastq header: %s' % fastq_header
AssertionError: Invalid fastq header: BZh91AY&SY �JN z �
1 Like

I am also attaching the head *fastq of my forward and reverse reads

Forward read

@HWI-ST1328:280:H98KRADXX:2:1101:3382:2090/1
GAAGCATAGTAGCCCCATCTGGATGAAGAACTATCATCCTTACAAGATCAATGACAGGAAATATCATTGAAATTGGCCATGACCGAAGCAAGTTCAGCAA

@AISWARYA_JAYAPRAKASH

To make sure there wasn’t an uncaught bug in one of the tools you are using, I ran a test for HISAT2 using some test data and the output for unaligned F/R reads are complete, correctly formatted fastq datasets that run through the Fastq Interlacer tool correctly.

This indicates that there was some formatting problem present in your fastq data. Not all tools will error due to formatting problems, instead, the tools will ignore or skip over malformed reads (mapping tools, in particular, will do this). My guess is that your data contains at least one read that is malformed.

To troubleshoot fastq formatting, the tool Fastq Groomer can be run with default settings. These setting will not alter the content, but will allow the tool to be used a format validator – if any reads do not fit the fastq format, the job will fail and the first occurrence of a problem will be reported in the full error report. To view the error (if you get one), click into the “i” Job Details icon then on stderr or stdout on that report to review the problem.

For reference, fastq format is described in this FAQ: Common datatypes explained

Please give that a try and we can follow up more from there with the result.