Workflow Splitting Samples Automatically

I’m using workflows for the first time. I fed 59 datasets into the workflow. Each dataset has multiple #hashtags associated with it. Once I started the workflow, I noticed that the datasets were being split into groups and processed in independent histories. I did select the “send the results to a new history” option, but anticipated only 1 new history being generated - this has generated >10 new histories. Is this a result of selecting that option, or is it being caused by something inherent with the #hashtags associated with the datasets. It does appear to have grouped the datasets based on the tags, as this was not the order that the samples were listed in the original history.

I’m trying to delete all of these histories on the “user history” page by selecting them all and clicking “delete permanently”, but the browser just seems to freeze up and I get an error: “Grid refresh failed”. Not sure how to clean this mess up x_x

11%20PM

1 Like

Ok, this is unexpected as far as I know.

Would you please save back a copy of history with the input datasets in it and the workflow that created this? Copying datasets does not use up more quota. Then generate a share link to both and send that to me in a direct message. I’d like to try to reproduce what happened and share this with the developers. This is odd enough to flush out what is happening soon. We are finalizing the next release.

For the rest of the data you want to purge, go ahead and purge histories in batch (under the history menu’s gear icon pick “Saved Histories”). The view are you using is more compute intensive – the other view is just a list.

Either way, purging many histories at once will overload the server a bit (why the grid fresh message come up). Wait for that message to come up (important, the action is being processed) then click on the top left “Galaxy” name/icon to reload the browser. The server will refresh and you can do other operations, including purging more histories. The database will catch up as a process that runs in the background.

Thanks and I’ll watch for your share links. Thanks!

Unfortunately I deleted the histories before getting your reply. However, I did find that re-running the workflow while selecting no for “send the results to a new history” prevented any new histories from being created. So, the strange result was definitely related to that option within the workflow. I’ll share my input samples and workflow with you, and maybe you can try running it on your end to see if it re-populates the error?

1 Like

Yes, that is exactly what we need. Your original output (multiple histories) shouldn’t be needed. The test will be to see if/how that can be reproduced with the “send to new history” workflow runtime option. Then to follow-up with a remedy.

Thank you again for reporting the problem and sharing the test case!

Ok, turns out this is expected behavior. Splitting into “one history per input” was a function put in place before dataset collections/tags were introduced. This works fine for people that have ~2-5 inputs. Not so good for 50+. We might update the workflow execution help text to make this type of “send to new history” output result more clear.

So, you’ll need to use dataset collections, or stick with individual dataset inputs and be Ok with all the history outputs. These are all named the same, but there isn’t a clear way to do that better. The new history name is specified by the user. There is no way to know how many inputs before the workflow executes – and the user entering a new name for each would be tedious anyway.

I hadn’t involved a workflow like this before (or it was long ago…) so didn’t recognize what was going on. But @dannon confirmed this is all working correctly.

Hope that helps to explain what is going on!