Long Job using Flye and quota usage disappeared

Hello,

I am trying to assemble a plant genome using ONT data with Flye. Prior to launch the assembly with all my data (about 37 Gb in fastqsanger.gz) I recently tried using half of them and the job was successful after 2 days running with a descent genome size obtained. I then launched with all the available sequences (37 Gb in fastqsanger.gz) and it’s now been running for 5 days. I am just wondering if I am using more ressources than allowed for this tool or if it’s still running. What makes me worrying is that now my quota usage now indicates 0% suggesting there might be a bug somewhere ?

Thank you very much.

Ben

1 Like

Hi @BBOACHON,
Did your analysis finish successfully? Sorry for the late response.

Regards

Hi, no I just cancelled it yesterday after 2 weeks running. I thought that the job was requesting too much memory and time. I could finally run the job in 24 hours on our lab server. I have tried to run an annotation with Funannotate with it is also talking probably way too long and some plant databases are not available (VIRIDIPLANTAE).

Kind regards.

1 Like

How many cores are you using? Probably it will be necessary to fine tune it in the Galaxy server.

Regards

Hello, To run Flye on our server I have been using from 10 to 24 threads and it was using about 300 Go of RAM. On the Galaxy server I don’t think we can tune these parameters right ?

For the Funannotate Job, it is still running on the Galaxy server since December 26, so I guess it is bugged for using more RAM than attibuted. Although I could not use the closest database to my organism (plant) although several plant databases are available. I got this error message “[Dec 25 01:14 PM]: ERROR: solanales_odb10 busco database is not found, install with funannotate setup -b solanales_odb10” and finally launched it using EUKARYOTA.

1 Like

Hi, I am having a similar problem/question and although the topic appeared as solved, I am not sure if you found a solution to deal with these “big” assemblies with Galaxy-Flye. I am trying to assemble about 20 Gbp of an ONT soil metagenome and the process is still running since December 31 (10 days ago). I am not sure if I should continue waiting or there was an error with this process and should cancel it to don’t waste resources.
Thank you in advance

Hello, Unfortunately my problem was not solved with Flye… The job was still running after a while and I decided to stop it. Same for Funannotate, there were missing databases and job was taking forever. I have issued this several times, jobs are not seeing as failed but still running. When using another instance (Use Galaxy France) I had the same problems but the jobs would fail after a while with this error message “This job was terminated because it used more memory than it was allocated”. I tried to report the problem, they allocated more memory but it failed again. You may try to ask for more RAM allocation for the tool but in my experience I considered that the project was probably oversized for the GALAXY serveurs, and I finally did it in command lines using our lab serveur. At least GALAXY was good for teaching the bases of assembly pipelines !!

Hi @Andres_Esteban_Marco @BBOACHON,

we are working in order to dynamically optimize the resources assigned to Flye according to the inputs.

Is your process still running @Andres_Esteban_Marco?

Regards

Thank you for the answer. This is great news.
Is there a way to see if the jobs are in pause or failed even if still running since weeks on Galaxy europe? In other instance when the job takes too long it fails.

Concerning the same issue with Funannote, are you going to try ressource optimization too ? Or should I open another topic related to it. It’s a very convenient tool.

Regards

Hi! Regarding the first question, you can ask in the Gitter channel about the state of your process. And yes, if you consider Funnotate requires it, I’ll work on it. It would be very useful if you could provide us with some details about the resources that Funnotate requires depending on the input file (something similar to this).

Regards

Hello, Thank you for your answer. Unfortunately I could not find examples of performance regarding the use of Funannotate. I am not sure this help, but in the manual (–memory : RAM to use for Jellyfish. Default: 50G, CPU 2 by Default : I found genome size is about ∼275.42 Mb).
I am trying to install Funannotate on our server, but it is quite difficult. If I suceed I will give it a try with my data and provide you the perfomance.

Regards

1 Like

Hi, you probably figured by yourself that I was not giving pertinent info, since jellyfish mentionned in the manual of funannotate is a tool, not the organism… sorry for that… Still learning :smile:

1 Like

Hello, thanks for your answer. I just checked (today) and the job is still running.

Best regards

1 Like

We updated the funannotate databases and provided more cores to this tool.

You can see how many cores and memory every tools gets here: infrastructure-playbook/tool_destinations.yaml at master · usegalaxy-eu/infrastructure-playbook · GitHub

Hello,

Oh that is great. I just figured my first job ended. So the updated resources might have helped.

Thank you very much for your time.

Ben

1 Like

Lovely, let us know if you need anything more. We are very interested to make assembly work seamlessly. We are working currently on this training for the VGP project, maybe that is also interesting for you.