I started a unicycler run two days ago (07/08/2023) on a set of 87 paired reads. Everything went fine (80 ok, 4 with errors) until yesterday morning when I noticed two jobs were running and 1 was waiting for quite a long time. Past issues solved themselves with time and the problem was me being impatient so I decided to wait, today these two jobs are still running and the other one is still waiting, so I suspect something’s off. Should I keep on waiting?
If the jobs are yellow in color (executing) it is usually best to allow those to complete to get the full job logs at the end.
For gray datasets, when running larger batches of work at the public servers, it isn’t unusual for some of those jobs to queue. Some of your jobs run, then some of other people’s jobs, then more of yours, repeat until completed. Try not to rerun since rerunning always puts jobs back at the end of the queue.
And … it has been several hours now since you wrote in… is the progress better? If not, would you please confirm that you working at UseGalaxy.org? Or, describe if somewhere else.
As usual, thanks for the quick reply. I’m aware that by using a public server I’ll have to share the resources with everyone, I’m perfectly fine with it What I found curious is that the Unicycler call started quite fast and the 80 assemblies were done in the expected amount of time but then the two jobs I mention started executing (yellow) and they remained this way since.
The progress is the same as yesterday and yes, I’m working at Usegalaxy.org
That does seem very odd. The jobs should time out after about 48 hours in the executing mode (yellow) at UseGalaxy.org. That doesn’t include the queue time (gray). So something weird is going on.
Do you want to share back a link to the history? I’ll take a look, and you can unshare after. Sharing your History
The history shows the two executing (yellow) jobs were started today. So, I’m guessing that you decided to rerun. Let those jobs run, or maybe better, start over with at least addressing the first tip below.
Two tips, both important.
When creating a collection, try to not include characters like a Dot in the collection element identifiers. Elements are the files inside collection folders.
Why this is important is complicated and can’t be worked around and may not impact every tool but avoid it anyway. Especially if things go wrong.
Use the function to strip off file extensions when creating the collection or use Relabel collection identifiers if the data is already loaded. There are a few more ways to do this but it looks like you already understand how collections generally work! For others reading, please see Search GTN Materials
Dashes and Underscores are Ok and your base file names otherwise looks fine.
The assigned datatype is what matters i.e. telling the tool the data format. Only one tool wrapped in Galaxy that I know of actually interprets the dataset_name.extension directly (Prokka, due to the way the original was written). If you plan to use that tool later, search this forum for the how-to.
Consider running some QA. At the start, or after weird behavior. Assembly is sensitive to read content, and most of those tools are also super picky in other ways (require intact pairs, etc).
FastQC and Fastq info are good choices and check different things (read quality and intact pairs, respectively).
That QA catches most data content/format problems at the start, or at least gives some context for assemblies that may fail or need parameter tuneups, or maybe more trimming.
Running these before and after trimming informs about what that trimming step did (if anything!).
Your current jobs are not failing for quality reasons but is something to know about.
Thanks for the reply and for checking my history. I don’t know if they were started on Friday or not but I didn’t do anything since the original Unicycler call (By the way, things are still unchanged). I usually rename file names to avoid spaces or dots but I didn’t know the extension was also read as a part of the name, I’ll remove the file extensions for the next run.
I normally do QC before and after trimming as you suggest so I also think it doesn’t have to do with quality.
I didn’t try it yet but would it be possible to download the assembled reads even if the collection hasn’t been processed as a whole? If so, I might do this and retry the assembly with the affected reads.
The jobs look failed now, so you could try rerunning.
Yes, you can filter a collection to exclude empty or failed datasets: see tools in the group Collection Operations.
Do you mean create a new collection with just those datasets that didn’t process with the batch? If yes, you can go into the hidden datasets tab in the history, unhide the inputs, create a new collection out of them, then run those. Later on if that works, you could combined the two collections together again (original results + rerun results).
That said, the usual way is to rerun those elements again while in the collection making sure to check the box to replace back into the original output collection.
And, you could get fancy with filtering by element identifiers but with so few datasets and no workflow considerations, just doing it directly might be faster.
So that’s a few ways to go forward. Hope one works out for you!
Thanks for the support. I tried rerunning but only two jobs finished. I opted for doing the assembly locally, they worked fine, maybe it was something random from the server.
Anyhow, thanks again!