Possible error in meryl

Dear community,

i run meryl for k-mer coverage counting due to the following workflow: Galaxy | Europe | Accessible Workflow | Kcov .

When doing so with a paired end Illumina dataset it first work well, i got results.

When repating the same workflow with more data the job does not failed (no error message) but the resulting .meryldb file is not suitable for histogram creation within the next step, it resulted in an empty file.

Does anybody know what the problem is? Too much data? Too less quota?

All the Best

Thomas

Meryl
Dataset Information
Number 22
Name Meryl on data 19: read-db.meryldb
Created Monday Jul 5th 4:38:13 2021 UTC
Filesize 6.9 MB
Dbkey ?
Format meryldb
File contents contents
History Content API ID
11ac94870d0bb33a65184c01b231c955
History API ID
e0706b5a535dc122
UUID a86e5aae-b668-409d-9f9e-1bdec2668a33
Full Path /data/dnb03/galaxy_db/files/a/8/6/dataset_a86e5aae-b668-409d-9f9e-1bdec2668a33.dat

Tool Parameters
Input Parameter Value
Operation type selector count-kmers
Count operations Count: count the occurrences of canonical k-mers
Input sequences

19: Collapse Collection on data 18, data 17, and others 

K-mer size selector provide
K-mer size 21

Job Information
Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy3
Command Line

export GALAXY_MEMORY_GB=$((${GALAXY_MEMORY_MB:-8192}/1024)) && meryl count k=21 memory=$GALAXY_MEMORY_GB threads=${GALAXY_SLOTS:-1} /data/dnb03/galaxy_db/files/d/0/9/dataset_d09368a3-8058-42f6-bd3e-0ba9db38611a.dat output read-db.meryl && echo ‘K-mer size: 21’ && tar -zcf read-db.meryldb read-db.meryl

Tool Standard Output

K-mer size: 21

Tool Standard Error

Found 1 command tree.

Counting 111 (estimated) billion canonical 21-mers from 1 input file:
sequence-file: /data/dnb03/galaxy_db/files/d/0/9/dataset_d09368a3-8058-42f6-bd3e-0ba9db38611a.dat

SIMPLE MODE

21-mers
→ 4398046511104 entries for counts up to 65535.
→ 64 Tbits memory used

119496818480 input bases
→ expected max count of 477987273, needing 14 extra bits.
→ 56 Tbits memory used

15 TB memory needed

COMPLEX MODE

prefix # of struct kmers/ segs/ min data total
bits prefix memory prefix prefix memory memory memory

 1     2  P   285 MB    13 GM    17 MS  8192  B   142 GB   142 GB
 2     4  P   278 MB  7122 MM  8905 kS    16 kB   139 GB   139 GB
 3     8  P   271 MB  3561 MM  4341 kS    32 kB   135 GB   135 GB
 4    16  P   264 MB  1780 MM  2115 kS    64 kB   132 GB   132 GB
 5    32  P   257 MB   890 MM  1030 kS   128 kB   128 GB   129 GB
 6    64  P   250 MB   445 MM   500 kS   256 kB   125 GB   125 GB
 7   128  P   243 MB   222 MM   243 kS   512 kB   121 GB   121 GB
 8   256  P   237 MB   111 MM   118 kS  1024 kB   118 GB   118 GB
 9   512  P   231 MB    55 MM    57 kS  2048 kB   114 GB   115 GB
10  1024  P   225 MB    27 MM    27 kS  4096 kB   111 GB   111 GB
11  2048  P   221 MB    13 MM    13 kS  8192 kB   107 GB   108 GB
12  4096  P   221 MB  7122 kM  6680  S    16 MB   104 GB   104 GB
13  8192  P   227 MB  3561 kM  3231  S    32 MB   100 GB   101 GB
14    16 kP   245 MB  1780 kM  1559  S    64 MB    97 GB    97 GB
15    32 kP   289 MB   890 kM   752  S   128 MB    94 GB    94 GB
16    64 kP   383 MB   445 kM   362  S   256 MB    90 GB    90 GB
17   128 kP   578 MB   222 kM   174  S   512 MB    87 GB    87 GB
18   256 kP   976 MB   111 kM    84  S  1024 MB    84 GB    84 GB
19   512 kP  1780 MB    55 kM    41  S  2048 MB    82 GB    83 GB
20  1024 kP  3392 MB    27 kM    20  S  4096 MB    80 GB    83 GB  Best Value!
21  2048 kP  6624 MB    13 kM    10  S  8192 MB    80 GB    86 GB
22  4096 kP    12 GB  7123  M     5  S    16 GB    80 GB    92 GB
23  8192 kP    25 GB  3562  M     3  S    32 GB    96 GB   121 GB
24    16 MP    50 GB  1781  M     1  S    64 GB    64 GB   114 GB
25    32 MP   101 GB   891  M     1  S   128 GB   128 GB   229 GB
26    64 MP   202 GB   446  M     1  S   256 GB   256 GB   458 GB
27   128 MP   405 GB   223  M     1  S   512 GB   512 GB   917 GB
28   256 MP   810 GB   112  M     1  S  1024 GB  1024 GB  1834 GB

FINAL CONFIGURATION

Estimated to require 107 GB memory out of 110 GB allowed.
Estimated to require 4 batches.

Configured complex mode for 107.359 GB memory per batch, and up to 4 batches.

Start counting with THREADED method.
Used 10.589 GB / 109.922 GB to store 0 kmers; need 0.000 GB to sort 0 kmers

Input complete. Writing results to ‘read-db.meryl’, using 10 threads.
finishIteration()–

Finished counting.

Cleaning up.

Bye.

Tool Exit Code: 0
Job API ID: 11ac94870d0bb33ad73bde806337fc6c

Dataset Storage

This dataset is stored in a Galaxy object store with id files10.

Inheritance Chain
Meryl on data 19: read-db.meryldb

Job Metrics
cgroup
Memory softlimit on cgroup 0 bytes
Was OOM Killer active? No
OOM Control enabled No
Max memory usage (MEM+SWP) 106.9 GB
Memory limit on cgroup (MEM+SWP) 8.0 EB
Max memory usage (MEM) 106.9 GB
Memory limit on cgroup (MEM) 110.0 GB
Failed to allocate memory count 0
CPU Time 21 minutes
core
Job Runtime (Wall Clock) 26 minutes
Job End Time 2021-07-05 19:04:55
Job Start Time 2021-07-05 18:38:25
Memory Allocated (MB) 112640
Cores Allocated 10
hostname
hostname vgcnbwc-worker-c125m425-7047.novalocal
AWS estimate
1.06 USD
This job requested 10 cores and 110 Gb. Given this, the smallest EC2 machine we could find is m4.10xlarge (160 GB / 40 vCPUs / Intel Xeon E5-2676 v3 (Haswell)). That instance is priced at 2.4 USD/hour.
Please note, that those numbers are only estimates, all jobs are always free of charge for all users.

Dataset peek

Compressed binary file

cancel

Hi @fruitphytodoc,
it seems a memory issue. Could you try to run meryl count on the collection instead of collapsing the collection in a single datafile? After that you can merge the collection of .meryldb files by using operation on kmers:union-sum.

Regards

Dear @gallardoalba,
thanx i will try this! Give you a reply when I did it.

All the Best
thomas

1 Like

Hi @hxr, @gaalardoalba,

so due to the problem “possible error in meryl”, it is still persistent also after using a smaller dataset, without and with collapsing th datasets.

When using Illumina reads for k-mer counting the tool creates an output, which you can then use as input for histogram calculation. But it produces an empty file? Please how could we have a look on that issue?

Regards

Hi @fruitphytodoc,
the error seems to be related to the compressed files; I opened an issue about it Meryl: wrong outputs with compressed FASTQ files · Issue #3801 · galaxyproject/tools-iuc · GitHub. Could you try to decompress the fastq.gz files and relaunch the analysis? You can uncompress your files in that way:

Regards