Q: Error encounter during Genome assembly using SPAdes tool. A: Tool requires intact paired-end reads as an input.

Hi
I am using Galaxy server for genome assembly using SPAdes. But there is an error occurred:
Command line: /cvmfs/main.galaxyproject.org/deps/_conda/envs/__spades@3.12.0/bin/spades.py -o /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working --disable-gzip-output --only-assembler --careful -t 7 -m 288 -k 33,55,91 --cov-cutoff auto --pe1-fr --pe1-1 fastq:/pylon5/mc48nsp/xcgalaxy/main/staging/28325065/inputs/dataset_39274260.dat --pe1-2 fastq:/pylon5/mc48nsp/xcgalaxy/main/staging/28325065/inputs/dataset_39272829.dat

System information:
SPAdes version: 3.12.0
Python version: 3.8.1
OS: Linux-3.10.0-957.27.2.el7.x86_64-x86_64-with-glibc2.10

Output dir: /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working
Mode: ONLY assembling (without read error correction)
Debug mode is turned OFF

Dataset parameters:
Multi-cell mode (you should set ‘–sc’ flag if input data was obtained with MDA (single-cell) technology or --meta flag if processing metagenomic dataset)
Reads:
Library number: 1, library type: paired-end
orientation: fr
left reads: [‘/pylon5/mc48nsp/xcgalaxy/main/staging/28325065/inputs/dataset_39274260.dat’]
right reads: [‘/pylon5/mc48nsp/xcgalaxy/main/staging/28325065/inputs/dataset_39272829.dat’]
interlaced reads: not specified
single reads: not specified
merged reads: not specified
Assembly parameters:
k: [33, 55, 91]
Repeat resolution is enabled
Mismatch careful mode is turned ON
MismatchCorrector will be used
Coverage cutoff is turned ON and threshold will be auto-detected
Other parameters:
Dir for temp files: /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working/tmp
Threads: 7
Memory limit (in Gb): 288

======= SPAdes pipeline started. Log can be found here: /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working/spades.log

===== Assembling started.

== Running assembler: K33

0:00:00.000 4M / 4M INFO General (main.cpp : 74) Loaded config from /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working/K33/configs/config.info
0:00:00.001 4M / 4M INFO General (main.cpp : 74) Loaded config from /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working/K33/configs/careful_mode.info
0:00:00.002 4M / 4M INFO General (memory_limit.cpp : 49) Memory limit set to 288 Gb
0:00:00.002 4M / 4M INFO General (main.cpp : 87) Starting SPAdes, built from N/A, git revision N/A
0:00:00.002 4M / 4M INFO General (main.cpp : 88) Maximum k-mer length: 128
0:00:00.002 4M / 4M INFO General (main.cpp : 89) Assembling dataset (/pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working/dataset.info) with K=33
0:00:00.003 4M / 4M INFO General (main.cpp : 90) Maximum # of threads to use (adjusted due to OMP capabilities): 7
0:00:00.003 4M / 4M INFO General (launch.hpp : 51) SPAdes started
0:00:00.003 4M / 4M INFO General (launch.hpp : 58) Starting from stage: construction
0:00:00.003 4M / 4M INFO General (launch.hpp : 65) Two-step RR enabled: 0
0:00:00.003 4M / 4M INFO StageManager (stage.cpp : 132) STAGE == de Bruijn graph construction
0:00:00.020 4M / 4M INFO General (read_converter.hpp : 77) Converting reads to binary format for library #0 (takes a while)
0:00:00.020 4M / 4M INFO General (read_converter.hpp : 78) Converting paired reads
0:00:00.574 100M / 100M INFO General (binary_converter.hpp : 93) 16384 reads processed
0:00:00.972 116M / 116M INFO General (binary_converter.hpp : 93) 32768 reads processed
0:00:01.746 144M / 144M INFO General (binary_converter.hpp : 93) 65536 reads processed
0:00:03.283 204M / 204M INFO General (binary_converter.hpp : 93) 131072 reads processed
0:00:06.374 320M / 320M INFO General (binary_converter.hpp : 93) 262144 reads processed
0:00:12.521 548M / 548M INFO General (binary_converter.hpp : 93) 524288 reads processed
0:00:34.567 908M / 908M INFO General (binary_converter.hpp : 93) 1048576 reads processed
0:01:09.485 908M / 908M INFO General (binary_converter.hpp : 93) 2097152 reads processed
0:02:18.535 912M / 912M INFO General (binary_converter.hpp : 93) 4194304 reads processed
0:04:48.246 920M / 920M INFO General (binary_converter.hpp : 93) 8388608 reads processed
0:06:31.981 936M / 936M ERROR General (paired_readers.hpp : 58) The number of left read-pairs is larger than the number of right read-pairs
0:06:31.981 936M / 936M ERROR General (paired_readers.hpp : 60) Unequal number of read-pairs detected in the following files: /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/inputs/dataset_39274260.dat /pylon5/mc48nsp/xcgalaxy/main/staging/28325065/inputs/dataset_39272829.dat

== Error == system call for: “[‘/cvmfs/main.galaxyproject.org/deps/_conda/pkgs/spades-3.12.0-1/share/spades-3.12.0-1/bin/spades-core’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working/K33/configs/config.info’, ‘/pylon5/mc48nsp/xcgalaxy/main/staging/28325065/working/K33/configs/careful_mode.info’]” finished abnormally, err code: 255
What can I do ? Kindly help me to solve this problem.Thank You

1 Like

Did you already checked the input files?

The number of left read-pairs is larger than the number of right read-pairs
0:06:31.981 936M / 936M ERROR General (paired_readers.hpp : 60) Unequal number of read-pairs

If you work with paired-end data the forward and reverse reads need to be the same amount of reads.

1 Like

@animeshsen92

There are a few ways to ensure that both ends of a pair are included in datasets. Examples below:

  1. Use a QA tool like Trimmomatic. There will be four outputs – two for reads still paired after QA (forward + reverse) and two for reads no longer paired after QA (meaning, one of the ends didn’t pass QA). In most cases, some basic QA should be done before assembly anyway.
  2. Cycle the reads through Fastq Interlacer then Fastq Deinterlacer.
  3. Cycle reads through SeqTk tools: mergepe then dropse then seq (last tool run twice – once to output forward reads, once to output reverse reads).

Any of the above could be incorporated into a Workflow.

Most tools do not work with an interleaved/interlaced fastq input but combining then uncombining from an interleaved/interlaced format will remove reads that are unpaired.

Note: Interleaved is the same thing as Interlaced in this context – the terms are interchangeable. It simply means that forward and reverse reads are combined, in order, into the same fastq dataset. An example is here: https://galaxyproject.org/support/ncbi-sra-fastq/#interlaced-forward-and-reverse-reads

Thanks @gbbio !!

2 Likes

Thank You Very much for your valuable suggestions. actually I am a beginner for this whole genome project. If i provide you the raw .fastq file of the sequences, can you do this assembly part of this project?

Hi

I have also encountered similar error when I tried to assemble my data.

ERROR General (paired_readers.cpp : 41) Unequal number of read-pairs detected in the following files: /scratch1/03166/xcgalaxy/main/staging/42472016/working/paired_reads1/M18_2005.L350_FDSW220046255-3r_1.fastq.fastq /scratch1/03166/xcgalaxy/main/staging/42472016/working/paired_reads1/M18_2005.L350_FDSW220046255-3r_2.fastq.fastq

Can I know more on where to check whether our reads-pairs have equal number or not? Do I need to use any other specific tools? I have done FastQC to check my data and if I am not mistaken both forward and reverse reads have the same number.

Thanks in advance:)

You are probably using another tool between FastQC and SPAdes. First double check that. After that you can use FastQC on the exact files that you are giving as input for spades to check the number of reads. Also see the post above about interleaved/interlaced fastq input. If you are still running into this problem please open a new topic, this one is very old and solved. Mention the steps that you did before SPAdes and where you got your fastq files from (are there your own or downloaded from SRA for example).

1 Like