"This is a new dataset and not all of its data are available yet"

R_J · February 8, 2022, 4:27am

Hello everyone,
I am using the Galaxy Europe web application and I have imported some new data and am also working with data uploaded previously. My job run times have been very long recently and now whenever I start a job I get the message “This is a new dataset and not all of its data are available yet” which I have never gotten before.

Previously, seeing “This job is waiting to run” was common but I have not seen this new message before. Is this normal?

Edit: Sorry I forgot to mention that I am running a makeblastdb tool, a Filter Fasta tool, as well as changing the datatype on uploaded data. All have long wait times.

jennaj · February 8, 2022, 6:00pm

Hi @R_J

Both of these states mean that the job is queued. The dataset will be grey in color.

“This is a new dataset and not all of its data are available yet”
- First staging state
- Technically, the job is having its metadata interpreted
- Why? To sort the job into the appropriate cluster queue
- This may happen so fast that you don’t see it in the UI. Or, if the server is busy, this sub-step may take longer.
“This job is waiting to run”
- Second staging state
- Technically, the job has been successfully added to a cluster queue
- Why? This is a “ready” state, meaning it is now waiting until the target cluster has an available node
- Queued time can vary (grey dataset), as can execution time (yellow dataset)

If you think the job is “stuck” at either step: check the server status https://status.galaxyproject.org/ and ask here or in a Gitter chat to let us know about a potential new problem. General chat (any server): galaxyproject/Lobby - Gitter and EU server chat: usegalaxy-eu/Lobby - Gitter. The EU admin team may also reply more here at this post.

Different tools require different computational resources

Lower computational needs == individual jobs process quicker and as a result nodes are freed up quicker (to process the next job in their queue)
Higher computational needs == individual jobs process longer and as a result nodes are freed up at a slower pace

Other factors

Each Public Galaxy server is using its own cluster resources. There is variability in the cluster number/type/size.
How many people are using a service at any particular time and the types of jobs they are running both impact processing performance.
How many jobs you are concurrently running also impacts your own processing speed. Some of your jobs will run, then some of the other people’s jobs, then more of yours, repeat.

If you have work that is large or time-sensitive, the public Galaxy servers may not be appropriate.

Some people use multiple public Galaxy servers to distribute the work/data. One account at each is fine and expected.
Some people decide to use their own Galaxy, for short or long-term needs. When running your own Galaxy, you control how data/job quotas and job prioritization is set up, plus what resources are attached.
Ways to use Galaxy: Galaxy Platform Directory: Servers, Clouds, and Deployable Resources - Galaxy Community Hub
The GVL Cloudman version is a single or multi-user choice and AWS offers grants. Amazon Web Services (AWS) - Galaxy Community Hub && AWS Programs for Research and Education
The AnVIL version is a single-user choice sponsored by NHGRI and is a pay-for-use Google Cloud platform. AnVIL - Galaxy Community Hub

If you think the job is “stuck” at either queued step: check the server status https://status.galaxyproject.org/ and ask here or in a Gitter chat to let us know about a potential new problem. General chat (any server): galaxyproject/Lobby - Gitter and EU server chat: usegalaxy-eu/Lobby - Gitter. The EU admin team may also reply more here at this post.

I added a few tags to this topic that point to more details.

Topic		Replies	Views
Keep getting "This is a new dataset and not all of its data are available yet" for all plots and pretreatments I used. queued-gray-datasets	10	1158	April 5, 2020
Multibigwig summary running for 2 days queued-gray-datasets	1	295	April 26, 2023
Job wait to run: Urgent to run usegalaxy.org support queued-gray-datasets	1	581	September 2, 2022
This job is waiting to run - freebayes queued-gray-datasets	1	688	May 21, 2021
Queued gray datasets and how jobs process at public Galaxy servers usegalaxy.eu support queued-gray-datasets	1	200	November 28, 2023

"This is a new dataset and not all of its data are available yet"

Related topics