Dear community,
i run meryl for k-mer coverage counting due to the following workflow: Galaxy | Europe | Accessible Workflow | Kcov .
When doing so with a paired end Illumina dataset it first work well, i got results.
When repating the same workflow with more data the job does not failed (no error message) but the resulting .meryldb file is not suitable for histogram creation within the next step, it resulted in an empty file.
Does anybody know what the problem is? Too much data? Too less quota?
All the Best
Thomas
Meryl
Dataset Information
Number 22
Name Meryl on data 19: read-db.meryldb
Created Monday Jul 5th 4:38:13 2021 UTC
Filesize 6.9 MB
Dbkey ?
Format meryldb
File contents contents
History Content API ID
11ac94870d0bb33a65184c01b231c955
History API ID
e0706b5a535dc122
UUID a86e5aae-b668-409d-9f9e-1bdec2668a33
Full Path /data/dnb03/galaxy_db/files/a/8/6/dataset_a86e5aae-b668-409d-9f9e-1bdec2668a33.dat
Tool Parameters
Input Parameter Value
Operation type selector count-kmers
Count operations Count: count the occurrences of canonical k-mers
Input sequences
19: Collapse Collection on data 18, data 17, and others
K-mer size selector provide
K-mer size 21
Job Information
Galaxy Tool ID: toolshed.g2.bx.psu.edu/repos/iuc/meryl/meryl/1.3+galaxy3
Command Line
export GALAXY_MEMORY_GB=$((${GALAXY_MEMORY_MB:-8192}/1024)) && meryl count k=21 memory=$GALAXY_MEMORY_GB threads=${GALAXY_SLOTS:-1} /data/dnb03/galaxy_db/files/d/0/9/dataset_d09368a3-8058-42f6-bd3e-0ba9db38611a.dat output read-db.meryl && echo ‘K-mer size: 21’ && tar -zcf read-db.meryldb read-db.meryl
Tool Standard Output
K-mer size: 21
Tool Standard Error
Found 1 command tree.
Counting 111 (estimated) billion canonical 21-mers from 1 input file:
sequence-file: /data/dnb03/galaxy_db/files/d/0/9/dataset_d09368a3-8058-42f6-bd3e-0ba9db38611a.dat
SIMPLE MODE
21-mers
→ 4398046511104 entries for counts up to 65535.
→ 64 Tbits memory used
119496818480 input bases
→ expected max count of 477987273, needing 14 extra bits.
→ 56 Tbits memory used
15 TB memory needed
COMPLEX MODE
prefix # of struct kmers/ segs/ min data total
bits prefix memory prefix prefix memory memory memory
1 2 P 285 MB 13 GM 17 MS 8192 B 142 GB 142 GB
2 4 P 278 MB 7122 MM 8905 kS 16 kB 139 GB 139 GB
3 8 P 271 MB 3561 MM 4341 kS 32 kB 135 GB 135 GB
4 16 P 264 MB 1780 MM 2115 kS 64 kB 132 GB 132 GB
5 32 P 257 MB 890 MM 1030 kS 128 kB 128 GB 129 GB
6 64 P 250 MB 445 MM 500 kS 256 kB 125 GB 125 GB
7 128 P 243 MB 222 MM 243 kS 512 kB 121 GB 121 GB
8 256 P 237 MB 111 MM 118 kS 1024 kB 118 GB 118 GB
9 512 P 231 MB 55 MM 57 kS 2048 kB 114 GB 115 GB
10 1024 P 225 MB 27 MM 27 kS 4096 kB 111 GB 111 GB
11 2048 P 221 MB 13 MM 13 kS 8192 kB 107 GB 108 GB
12 4096 P 221 MB 7122 kM 6680 S 16 MB 104 GB 104 GB
13 8192 P 227 MB 3561 kM 3231 S 32 MB 100 GB 101 GB
14 16 kP 245 MB 1780 kM 1559 S 64 MB 97 GB 97 GB
15 32 kP 289 MB 890 kM 752 S 128 MB 94 GB 94 GB
16 64 kP 383 MB 445 kM 362 S 256 MB 90 GB 90 GB
17 128 kP 578 MB 222 kM 174 S 512 MB 87 GB 87 GB
18 256 kP 976 MB 111 kM 84 S 1024 MB 84 GB 84 GB
19 512 kP 1780 MB 55 kM 41 S 2048 MB 82 GB 83 GB
20 1024 kP 3392 MB 27 kM 20 S 4096 MB 80 GB 83 GB Best Value!
21 2048 kP 6624 MB 13 kM 10 S 8192 MB 80 GB 86 GB
22 4096 kP 12 GB 7123 M 5 S 16 GB 80 GB 92 GB
23 8192 kP 25 GB 3562 M 3 S 32 GB 96 GB 121 GB
24 16 MP 50 GB 1781 M 1 S 64 GB 64 GB 114 GB
25 32 MP 101 GB 891 M 1 S 128 GB 128 GB 229 GB
26 64 MP 202 GB 446 M 1 S 256 GB 256 GB 458 GB
27 128 MP 405 GB 223 M 1 S 512 GB 512 GB 917 GB
28 256 MP 810 GB 112 M 1 S 1024 GB 1024 GB 1834 GB
FINAL CONFIGURATION
Estimated to require 107 GB memory out of 110 GB allowed.
Estimated to require 4 batches.
Configured complex mode for 107.359 GB memory per batch, and up to 4 batches.
Start counting with THREADED method.
Used 10.589 GB / 109.922 GB to store 0 kmers; need 0.000 GB to sort 0 kmers
Input complete. Writing results to ‘read-db.meryl’, using 10 threads.
finishIteration()–
Finished counting.
Cleaning up.
Bye.
Tool Exit Code: 0
Job API ID: 11ac94870d0bb33ad73bde806337fc6c
Dataset Storage
This dataset is stored in a Galaxy object store with id files10.
Inheritance Chain
Meryl on data 19: read-db.meryldb
Job Metrics
cgroup
Memory softlimit on cgroup 0 bytes
Was OOM Killer active? No
OOM Control enabled No
Max memory usage (MEM+SWP) 106.9 GB
Memory limit on cgroup (MEM+SWP) 8.0 EB
Max memory usage (MEM) 106.9 GB
Memory limit on cgroup (MEM) 110.0 GB
Failed to allocate memory count 0
CPU Time 21 minutes
core
Job Runtime (Wall Clock) 26 minutes
Job End Time 2021-07-05 19:04:55
Job Start Time 2021-07-05 18:38:25
Memory Allocated (MB) 112640
Cores Allocated 10
hostname
hostname vgcnbwc-worker-c125m425-7047.novalocal
AWS estimate
1.06 USD
This job requested 10 cores and 110 Gb. Given this, the smallest EC2 machine we could find is m4.10xlarge (160 GB / 40 vCPUs / Intel Xeon E5-2676 v3 (Haswell)). That instance is priced at 2.4 USD/hour.
Please note, that those numbers are only estimates, all jobs are always free of charge for all users.
Dataset peek
Compressed binary file