Unicycler - hybrid assembly failure

Dear All,

I’ve recently encountered some issues with Unicycler assembly. I’ve tried to perform hybrid assembly with use of

  1. trimmed Illumina reads R1 (0.17 Gb) + R2 (0.16 Gb); format: fastqsanger.gz

  2. nanopore reads (2.3 Gb); format: fasqsanger

Unicycler readily deals with individual assembly of either Illumina or Nanopore reads. However, it fails to generate hybrid assembly. Any suggestions?

maybe is it about available working memory @ Galaxy server?

thanks in advance,

Piotr

PS here is the error report

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified 

tput: No value for $TERM and no -T specified

/pylon5/mc48nsp/xcgalaxy/main/staging/23588931/command.sh: line 95:
38467 Segmentation fault      

(core dumped) unicycler -t
"${GALAXY_SLOTS:-4}" -o ./ --verbosity 3 --pilon_path $pilon -1'fq1.fastq.gz' -2 'fq2.fastq.gz' -l lr.fastq --mode 'conservative' --min_fasta_length '100' --linear_seqs '0' --min_kmer_frac '0.2' --max_kmer_frac '0.95' --kmer_count '10' --depth_filter '0.25' --start_gene_id '90.0' --start_gene_cov '95.0' --min_polish_size '1000' --min_component_size '1000' --min_dead_end_size '1000' --scores '3,-6,-5,-2'
1 Like

Hi @piotr-piotr-majewski!!

Please try running the job again now. I would suggest trying at least two, to see if you can hit a non-problematic cluster node.

There were some cluster issues yesterday and a few earlier today (different reason). A very small number of jobs failed, but most didn’t. There is likely some small bit of cluster tuning that still needs to be done on our side, but that is unlikely to happen until Monday.

These are the important parts of the error message. The “tput” lines are a technical side-effect error output that can be ignored.

What to do if a few unmodified reruns fail:

  1. Make sure that you have done some QA/QC on your data. FastQC is a great tool for diagnosing what could be done to improve the data quality (example: removing adaptor and quality trimming with Trimmomatic; the presence of other sequencing artifacts or contamination – both of which can lead to assembly issues). Help for interpreting FastQC output: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ >> Documentation > Analysis modules https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/.
  2. Try tuning the alignment parameters to better fit the profile of your inputs. Unicycler usage documentation is here: https://github.com/rrwick/Unicycler
  3. Review a Galaxy GTN tutorial that covers the tool (bacterial assembly, but could give usage cues, even if assembling some other type of organism): https://galaxyproject.github.io/training-material/topics/assembly/tutorials/unicycler-assembly/tutorial.html

^^ can be due to many reasons, including a job that is too big to run with the given inputs/parameters. The dataset sizes seem Ok, so if this turns out to be the reason, there is more going on content-wise. QA/QC might address that so be sure to not skip doing it. I know these worked independently, but combining the two types of inputs will impact how the overall assembly is processed.


After trying the above, and the jobs still fail after a few reruns/parameter tunings, please send in a bug report from one of the failed assemblies and we can review the full error message, input data, settings, et cetera and offer more advice. Please include a link to this post in the comments and be sure to leave the error output and all input datasets undeleted. If the history contains multiple failed jobs (it should IF you have tested different parameters/reruns), leave all of those undeleted as well). Also, please leave your FastQC results undeleted so we don’t have to run the tool (that is usually where I start when troubleshooting analysis problems involving fastq data, unless something more obvious is clearly going wrong, meaning: tool usage problems or server issues).

Thanks!