Busco not working for a particular dataset

Different tools have different algorithms, so this is expected for the “yellow” executing stage.

How long jobs queue – the “gray” stage – is distinct and depends on a few factors. Tool choice, how busy the server is… there are many topics at this forum that cover those details.

You could try more filtering. I gave some ideas in a prior reply above. Maybe just input the longest sequences. Or, just the top N lines of the file. The tool might actually run and report more informative error logs.

In the end, the problem could be the data itself of course but that’s what I would try, along with generating some more statistics. Get the lengths, cluster those to see if you can find any patterns, compare to the author resources about what the tool itself can handle, etc. There isn’t a single solution.

What you have now doesn’t look like a technical bioinformatics problem exactly – instead, this is a scientific problem to solve. Meaning, you need to give the tool something that it knows how to process :slight_smile:

This particular tool is trying to cluster and annotate your data by comparing it against a reference, then generating some stats about what it found. It might simply be overwhelmed by the number of input sequences, or the data doesn’t have any matches, or too many non-specific matches, or maybe something else like too many short reads it cannot process. You’ll need to investigate.

From what you’ve explained and tested so far, all of this would happen even outside of Galaxy, probably, unless the computer had infinite resources and the tool itself could actually keep track of the intermediate data.

Assembly and annotation processes are super sensitive to data content. So, you could try mapping the assembly against a reference (same species, or different) and reviewing in a browser like IGV, along with any known annotation (GTFs) and explore that way too. You could even just BLAT or BLAST+ a few sequences or any suspect sequences at UCSC or NCBI to see what happens. None of that is exact, it is exploratory.

Hope this works out for you, and feel free to post back what you find here! Might help others who are reading later on or following along now.