MethylDackel running time too long

Guy_Haim · May 24, 2021, 8:24am

Hi, I submitted three datasets to the MethylDackel tool and it is running for the past two days (other users said it took them 2-3 hours to get results). does anyone know what’s the problem?

Thanks

bjoern.gruening · May 24, 2021, 8:34am

Hi @Guy_Haim,

I have this here in your logs.

[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_load] build FASTA index.
[fai_fetch_seq] The sequence "chr1" not found
[fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq returned -2 while trying to fetch the sequence for tid chr1:0-1000000!
Note that the output will be truncated!
[fai_fetch_seq] The sequence "chr1" not found
[fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq returned -2 while trying to fetch the sequence for tid chr1:1000000-2000000!
Note that the output will be truncated!
[fai_fetch_seq] The sequence "chr1" not found
[fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq returned -2 while trying to fetch the sequence for tid chr1:3000001-4000001!
Note that the output will be truncated!
[fai_fetch_seq] The sequence "chr1" not found
[fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq returned -2 while trying to fetch the sequence for tid chr1:4000001-5000001!
Note that the output will be truncated!
[fai_fetch_seq] The sequence "chr1" not found
[fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq returned -2 while trying to fetch the sequence for tid chr1:5000001-6000001!
Note that the output will be truncated!
[fai_fetch_seq] The sequence "chr1" not found
[fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq returned -2 while trying to fetch the sequence for tid chr1:6000001-7000001!
Note that the output will be truncated!
[fai_fetch_seq] The sequence "chr1" not found
[fai_fetch_seq] The sequence "chr1" not found
faidx_fetch_seq returned -2 while trying to fetch the sequence for tid chr1:10000002-11000002!
Note that the output will be truncated!

Are you supplying your own index? Which reference genome are you using?
Ciao,
Bjoern

Guy_Haim · May 24, 2021, 8:36am

Yes, my own index.

13: GRCh38.primary_assembly.genome.fa uncompressed

do you think I should use another index?

bjoern.gruening · May 24, 2021, 8:41am

Any reason you can not use …

grafik

Guy_Haim · May 24, 2021, 9:00am

not that I can think of (except I used it for the previous stages of the reduced representation bisulfite sequencing files I got).

anyway, I’ll try…

bjoern.gruening · May 24, 2021, 9:02am

Please let me know if this speeds up your processing.

Guy_Haim · May 24, 2021, 9:06am

No problem, thanks.

Guy_Haim · May 24, 2021, 11:55am

Hi again, it’s still running. How long does it take normally?
Also, do you think I should re-do the previous stages? This time with this ref genome:

jennaj · June 4, 2021, 4:27pm

Hi @Guy_Haim

Using the same exact reference genome (build + version) throughout analysis is important.

The natively indexed version of hg38 was sourced from UCSC. If that is the same as what you used for upstream steps (in Galaxy or not), all should be fine.

Other reference genome sources can be used, however, some data adjustments are usually needed to avoid problems. This FAQ has more details:

Mismatched Chromosome identifiers (and how to avoid them)

Guy_Haim · June 4, 2021, 6:35pm

Hi, thanks for replying…
Actually, I was told to use the built-in genome index in galaxy instead of the index I created on my own, but it didn’t work and I used it for the upstream steps and for the MethylDackel tool as well.

Still, it’s running for almost two weeks now.

If you can think of anything else I can do I would appreciate it.

Thanks again,
Guy

בתאריך יום ו׳, 4 ביוני 2021, 19:37, מאת Jennifer Hillman-Jackson via Galaxy Community Help ‏<galaxy@discoursemail.com>:

jennaj · June 4, 2021, 7:24pm

@Guy_Haim

Using mammalian or some plant genomes, that are large, tends to exceed resources at public Galaxy servers when input as a custom genome (fasta) from the history. Why? Because before the tool is run, the genome may need indexing. I saw that you are now using the built-in index, so that will resolve part of the problem.

If you are still having issues, check these three things first:

The upstream inputs were successfully uploaded/created (green dataset), and contain content. It is possible for a job to technically be successful but contain no output (or if a BAM, sometimes just the header if no hits passed the mapping criteria).
Make sure your reference annotation is a match for the reference genome and is formatted correctly. I added some tags to your post that link to much Q&A about how to ensure/address both.
Check your parameters versus the reference annotation content. Some tools allow you to specify which portions of the annotation to consider, others just assume that content is present (pre-set mandatory content). The tool form help plus linked documentation can help with this.

Other than that, if a job is queued, it is waiting for resources to become available. How long a job queues depends on how busy the server is and how many jobs you have that are queued (grey) + executing (yellow). If all that doesn’t help, maybe @bjoern.gruening or @gallardoalba can help more since you are working at the EU server.

Guy_Haim · June 4, 2021, 8:04pm

OK, I’ll try…
Thank you so much for the help.

Guy

בתאריך יום ו׳, 4 ביוני 2021, 22:34, מאת Jennifer Hillman-Jackson via Galaxy Community Help ‏<galaxy@discoursemail.com>:

bjoern.gruening · June 4, 2021, 8:47pm

Methyldackel has some problems at the moment we are updating the tool to the latest version. Please check again in 24h and try the latest version. Sorry for the inconvenience.

jennaj · June 5, 2021, 5:55pm

2 posts were split to a new topic: Searching the tool panel – Example at UseGalaxy.eu for tool MethylDackel

Lewis · June 11, 2021, 7:43am

Hi @bjoern.gruening, I also appear to be having problems with long MethylDackel run times. I am using the bam outputs from the Bismark mapper, and for some of these MethylDackel finishes quickly (a few minutes in some cases!), but for most of the others it has been running for a long time (some for more than a few days). Is this normal? Do you have an idea as to how long these jobs should normally take?

Many thanks,
Lewis

bjoern.gruening · June 11, 2021, 7:58am

Have you tried the new version as well? 0.5.2?

Lewis · June 11, 2021, 8:17am

Yep, all of the jobs are running using version 0.5.2+galaxy0.

Lewis · June 15, 2021, 8:39am

Still no luck with the new version however, pretty much all of the jobs have been running for 2 days plus now. Do you have any ideas why they might be taking so long to complete?

Guy_Haim · June 15, 2021, 11:01am

I resubmitted my files and they are running for the past 8 days, with the new version…