Trinity error in usegalaxy.org site

AISWARYA_JAYAPRAKASH · April 7, 2019, 8:34pm

I have rna-seq data sets which after aligning through HiSat2 I identified unaligned reads.

The unaligned reads as output of HiSat2 was given as unaligned reads and the following error was noted.
Could anyone help me to sort this error

---------------------------------------------------------------------------------- -------------- Trinity Phase 1: Clustering of RNA-Seq Reads --------------------- ---------------------------------------------------------------------------------- Converting input files. (in parallel)Thursday, April 4, 2019: 02:26:37 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384659.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /1 --to-fasta >> left.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384659.dat.readcount Thursday, April 4, 2019: 02:26:37 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384661.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /2 --to-fasta >> right.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384661.dat.readcount Thread 2 terminated abnormally: Error, counts of reads in FQ: 2061214.5 (as per cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384661.dat | wc -l) doesn’t match fastool’s report of FA records: 0 at /opt/packages/trinity/2.2.0/Trinity line 3087 thread 2. main::ensure_complete_FQtoFA_conversion(‘cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/da…’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/datase…’) called at /opt/packages/trinity/2.2.0/Trinity line 2116 thread 2 main::prep_seqs(‘ARRAY(0x2857f00)’, ‘fq’, ‘right’, undef) called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2 eval {…} called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2 Thread 1 terminated abnormally: Error, counts of reads in FQ: 2086075.75 (as per cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/dataset_30384659.dat | wc -l) doesn’t match fastool’s report of FA records: 0 at /opt/packages/trinity/2.2.0/Trinity line 3087 thread 1. main::ensure_complete_FQtoFA_conversion(‘cat /pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/da…’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging//22959260/inputs/datase…’) called at /opt/packages/trinity/2.2.0/Trinity line 2116 thread 1 main::prep_seqs(‘ARRAY(0x26d84a0)’, ‘fq’, ‘left’, undef) called at /opt/packages/trinity/2.2.0/Trinity line 1314 thread 1 eval {…} called at /opt/packages/trinity/2.2.0/Trinity line 1314 thread 1 Trinity run failed. Must investigate error above.

jennaj · April 8, 2019, 3:44pm

Welcome, @AISWARYA_JAYAPRAKASH!

Trinity can be run in paired end mode. This requires two distinct inputs, one dataset for the forward/left reads and one dataset for the reverse/right reads. The same base sequence identifiers must be included in both inputs and these inputs cannot be the same dataset.

Trinity can also be run in single end mode. This requires a single input, one dataset containing all reads. This should not be an “interlaced” fastq dataset – it should contain forward reads, or reverse reads, but not both.

Also, the fastq input(s) need to represent complete records for Trinity.

HISAT2 can handle incomplete fastq records (perhaps truncated during Upload?). It discards any incomplete records to the unmapped output.

Check your run to see where the problem was introduced versus the above usage help. The error message suggests a combination of problems.

AISWARYA_JAYAPRAKASH · April 10, 2019, 12:17pm

Dear Jennifer

My RNA_seq datasets are paired end and I have separate fastqsanger files for forward and reverse strands

After Hisat2 alignment, the unaligned reads are also obtained as separate files for forward and reverse strands in fastqsanger file format.

By doing paired end assembly in trinity, its again giving me same error

Trinity Phase 1: Clustering of RNA-Seq Reads

Converting input files. (in parallel)Wednesday, April 10, 2019: 02:28:39 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /1 --to-fasta >> left.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat.readcount
Wednesday, April 10, 2019: 02:28:39 CMD: cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384640.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /2 --to-fasta >> right.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384640.dat.readcount

Thread 1 terminated abnormally: Error, cmd: cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat | /opt/packages/trinity/2.2.0/trinity-plugins/fastool/fastool --append /1 --to-fasta >> left.fa 2> /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384638.dat.readcount died with ret 256 at /opt/packages/trinity/2.2.0/Trinity line 2206.

Thread 2 terminated abnormally: Error, counts of reads in FQ: 2343568.75 (as per cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/dataset_30384640.dat | wc -l) doesn’t match fastool’s report of FA records: 0 at /opt/packages/trinity/2.2.0/Trinity line 3087 thread 2. main::ensure_complete_FQtoFA_conversion(‘cat /pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/da…’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging//23032024/inputs/datase…’) called at /opt/packages/trinity/2.2.0/Trinity line 2116 thread 2 main::prep_seqs(‘ARRAY(0x28a75f8)’, ‘fq’, ‘right’, undef) called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2 eval {…} called at /opt/packages/trinity/2.2.0/Trinity line 1317 thread 2

Trinity run failed. Must investigate error above.

jennaj · April 10, 2019, 1:26pm

Thanks for explaining more.

The problem is likely that the two unpaired read inputs do not contain the same base sequences anymore (or may have not originally when mapped). HISAT2 does not require matched up inputs, but Trinity does.

Try running the post-HISAT2 unmapped datasets through these two tools to generate matched up inputs appropriate for Trinity. You’ll probably lose some reads if the forward or reverse for any are not both present in the “unmapped” output, and those extra reads are what is most likely causing the error (read counts between both inputs are not the same).

Tool Group: FASTA/FASTQ

FASTQ interlacer on paired end reads
FASTQ de-interlacer on paired end reads

Trimmomatic will also create matched paired-end results but might apply unnecessary read QA at this point in the analysis. If you didn’t do QA first (before mapping), consider doing it, both mapping and assembly tend to work better with cleaned-up reads (adaptor removed, etc).

Whether or not you already did or decide now to do QA before mapping, the “unmapped” results from HISAT2 will still need to have the reads paired up again before inputting the data to Trinity.

Thanks and let us know how using matched pairs works out.

AISWARYA_JAYAPRAKASH · April 13, 2019, 7:34am

Dear Jennifer,

I have down the QA of the reads before running hisat2 and the fastqc report showed no adapter contamination and good per base sequence quality and the following warnings.

            warning (Orange) : Per tile sequence quality and Sequence duplicaiton level.

Fail (Red) : Per base sequence content.

I tried running FASTQ interlacer on paired end reads on my Hisat2 unaligned reads. The job got terminated showing following error. Kindly help me to resolve this issue

  Traceback (most recent call last):
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/bin/gx-fastq-paired-end-interlacer](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/bin/gx-fastq-paired-end-interlacer)    ", line 6, in <module>
sys.exit(galaxy_utils.sequence.scripts.fastq_paired_end_interlacer.main())
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/scripts/fastq_paired_end_interlacer.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/scripts/fastq_paired_end_interlacer.py)    ", line 36, in main
for i, mate1 in enumerate(fastqReader(path=mate1_filename, format=type)):
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py)    ", line 616, in __iter__
yield next(self)
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/six.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/six.py)    ", line 564, in next
return type(self).__next__(self)
File "/cvmfs/[main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py](http://main.galaxyproject.org/deps/_conda/envs/__galaxy_sequence_utils@1.1.2/lib/python2.7/site-packages/galaxy_utils/sequence/fastq.py)    ", line 592, in __next__
assert fastq_header.startswith('@'), 'Invalid fastq header: %s' % fastq_header
AssertionError: Invalid fastq header: BZh91AY&SY �JN z �

AISWARYA_JAYAPRAKASH · April 13, 2019, 7:44am

I am also attaching the head *fastq of my forward and reverse reads

Forward read

@HWI-ST1328:280:H98KRADXX:2:1101:3382:2090/1
GAAGCATAGTAGCCCCATCTGGATGAAGAACTATCATCCTTACAAGATCAATGACAGGAAATATCATTGAAATTGGCCATGACCGAAGCAAGTTCAGCAA

jennaj · April 15, 2019, 3:16pm

@AISWARYA_JAYAPRAKASH

To make sure there wasn’t an uncaught bug in one of the tools you are using, I ran a test for HISAT2 using some test data and the output for unaligned F/R reads are complete, correctly formatted fastq datasets that run through the Fastq Interlacer tool correctly.

This indicates that there was some formatting problem present in your fastq data. Not all tools will error due to formatting problems, instead, the tools will ignore or skip over malformed reads (mapping tools, in particular, will do this). My guess is that your data contains at least one read that is malformed.

To troubleshoot fastq formatting, the tool Fastq Groomer can be run with default settings. These setting will not alter the content, but will allow the tool to be used a format validator – if any reads do not fit the fastq format, the job will fail and the first occurrence of a problem will be reported in the full error report. To view the error (if you get one), click into the “i” Job Details icon then on stderr or stdout on that report to review the problem.

For reference, fastq format is described in this FAQ: Common datatypes explained

Please give that a try and we can follow up more from there with the result.

Topic		Replies	Views
trinity from HISAT2 unaligned reads usegalaxy.eu support assembly , troubleshooting , mapping , blast , igv	4	303	September 6, 2023
HISAT2 output error mapping , tool-help , hisat2	3	130	October 21, 2024
Error with HISAT2 usegalaxy.org support transcriptomics	3	22	April 3, 2025
Error in paired-end analysis data-manipulation , fastqsanger , quality-control	19	1311	May 19, 2023
Trinity run: Remote job server indicated a problem running or monitoring this job. usegalaxy.org support server-admin	3	1312	June 7, 2019

Trinity error in usegalaxy.org site

Related topics