"This is a new dataset and not all of its data are available yet"

Hello everyone,
I am using the Galaxy Europe web application and I have imported some new data and am also working with data uploaded previously. My job run times have been very long recently and now whenever I start a job I get the message “This is a new dataset and not all of its data are available yet” which I have never gotten before.

Previously, seeing “This job is waiting to run” was common but I have not seen this new message before. Is this normal?

Edit: Sorry I forgot to mention that I am running a makeblastdb tool, a Filter Fasta tool, as well as changing the datatype on uploaded data. All have long wait times.

1 Like

Hi @R_J

Both of these states mean that the job is queued. The dataset will be grey in color.

  1. “This is a new dataset and not all of its data are available yet”

    • First staging state
    • Technically, the job is having its metadata interpreted
    • Why? To sort the job into the appropriate cluster queue
    • This may happen so fast that you don’t see it in the UI. Or, if the server is busy, this sub-step may take longer.
  2. “This job is waiting to run”

    • Second staging state
    • Technically, the job has been successfully added to a cluster queue
    • Why? This is a “ready” state, meaning it is now waiting until the target cluster has an available node
    • Queued time can vary (grey dataset), as can execution time (yellow dataset)

If you think the job is “stuck” at either step: check the server status https://status.galaxyproject.org/ and ask here or in a Gitter chat to let us know about a potential new problem. General chat (any server): galaxyproject/Lobby - Gitter and EU server chat: usegalaxy-eu/Lobby - Gitter. The EU admin team may also reply more here at this post.

Different tools require different computational resources

  • Lower computational needs == individual jobs process quicker and as a result nodes are freed up quicker (to process the next job in their queue)
  • Higher computational needs == individual jobs process longer and as a result nodes are freed up at a slower pace

Other factors

  • Each Public Galaxy server is using its own cluster resources. There is variability in the cluster number/type/size.
  • How many people are using a service at any particular time and the types of jobs they are running both impact processing performance.
  • How many jobs you are concurrently running also impacts your own processing speed. Some of your jobs will run, then some of the other people’s jobs, then more of yours, repeat.

If you have work that is large or time-sensitive, the public Galaxy servers may not be appropriate.

If you think the job is “stuck” at either queued step: check the server status https://status.galaxyproject.org/ and ask here or in a Gitter chat to let us know about a potential new problem. General chat (any server): galaxyproject/Lobby - Gitter and EU server chat: usegalaxy-eu/Lobby - Gitter. The EU admin team may also reply more here at this post.

I added a few tags to this topic that point to more details.

1 Like