I’d like to generate workflows for some of my histories as a memory which functions were performed as I need to free-up some space in my account. However, for larger histories I’m able to view the list of datasets and functions, but subsequently creating a workflow results in a blank window. Are there any limitations to this process?
Henk
Hi @Henk
No problems with extracting are known right now.
The basic flow usually goes like this. Maybe it helps to find where things went wrong? Then we can followup more.
- Complete the work in the history
- Cleanup the data in that history
- Delete any files that were experimental or otherwise won’t be generated by the final workflow
- Tip: To filter the history by the inputs and outputs associated with any single dataset, expand the dataset and click on the “Dataset Map icon”. FAQ: Different dataset icons and their usage
- Extract the workflow. You can also do some curation on that view – this usually involves unchecking some tools.
- Open your new workflow in the editor, and see if any warnings are shown, and address those.
- Add annotation using the “Best Practices” guide in the editor. Seems tedious but future you will appreciate it!
- Try running the workflow, and send the outputs to a new history (gear icon). The inputs will be copied over by default, and that new history will represent everything needed to recreate the work using your workflow.
- Repeat as needed. Once you have it all working, you can purge your existing original history, keep your final workflow output history, and consider going back into the workflow and making adjustments about how to handle those intermediate datasets (hide or delete if not needed for downstream steps and similar).
If that is not enough, we can help more. Screenshots of any warnings the workflow editor pops-up would be a good place to start, plus the server URL where you are working.
Let’s start there
Xref
- FAQ: How can I reduce quota usage while still retaining prior work (data, tools, methods)?
- Hands-on: Hands-on: Extracting Workflows from Histories / Using Galaxy and Managing your Data
- GTN Materials Search query=workflow → most resources related to workflows
Hi @Jennaj,
Tried to follow up on your suggestions. Deleted all files that I don’t need for the workflow but still no pop-up indicating that my workflow was generated and ready for editing. My window that usually contains this info remains blank. Still stuck at point 3. Hope this reply makes any sense.
Hi @Henk
Extracting a workflow should work, so I’d like to follow up until this works for you. Did you solve this already or do you still need help?
What I’ll need:
- Which server are you working at? URL please
- If you share a copy of the history, I’ll take a look directly.
- FAQ: Sharing your History
- That ^^ can be in this topic directly, or in a private message here.
- Or, if you are working at UseGalaxy.org, post your username and the name of the history and I can look at it that way. Give the history a unique name please, not the default.
The other option for saving work is to create a History Archive, and to download that, or put it someplace you can stash data in your cloud. You can always upload it later – from local desktop or with a URL. That would save all data directly.
Thanks, and sorry this was a problem for you. We’ll figure it out!
Hi @jennaj,
I’m using the usegalaxy.eu server. The following link gets you to the history: Galaxy. Hope this will help solve this issue.
Thanks,
Henk
Hi @Henk
I don’t have access to the datasets, so I can’t do anything with them. You could adjust that on the Share or Publish page.
And now I’m wondering if that is your problem as well. Was this an imported history from another account? Or imported input datasets from another account? Just some of them?
I can see the chain of copies when clicking on the “i” icon in the input dataset in the two collections. It looks like part of the history was copied from somewhere else. The FastQC/MultiQC runs are where the stricter data permissions were added in. Then the other tools (mapping, cutadapt) do have shared permissions.
This is a bit curious so I’d like to keep following up until resolved. Specifically, I’d like to open an issue ticket to provide more user feedback during workflow extraction when it fails due to data permissions … and maybe workflow extraction shouldn’t fail at all due to data permissions since the workflow should only care about how the data was created, not the actual data files themselves. But I need an example.
So – could you do two things:
- Fully share/publish the existing history
- Clarify more about the sharing/import steps you applied to the history: copied data just within your account or across accounts? From the same server or imported an archive from another server? We can bring in the EU admins who can look at this in full detail as needed but would like to get the general situation clarified first.
Thanks!
Thanks for your reply. I shared the history on the published histories page. Hope you can access it now. The data sets are uploaded from my own laptop via ’ upload data’. It looks like there are multiple copies of the same fq.gz files in this history. Purged all but one of each sample. Didn’t help getting the workflow generated, though…
Should it work if I copy the data sets to a new history, start the process again and then create a workflow of that?
Thanks so much,
Henk
Yes, please try that. It was one of the things that I tested, and was how I found out that one of the fastqc/multiqc steps (the first pair in the original history) had different permission levels.
Also, remember to “uncheck” any tool steps that do not have the original input file available anymore. There were at least two of these – both seemed to be reformatting for reference data. The alternative would have been to leave those in the history if you wanted to include the reformatting steps in the workflow.
The default instructions sort of assume that everything is still available… I could clarify better in the instructions/FAQ that “original state” histories are needed if you don’t want to curate the what to extract into the workflow. Failed steps can be removed, but not any of the starting data.
Let me know what happens, then I’ll check again if still needed. Thanks!
Hi,
Started off with copying the datasets from the original history to a new one. Already FastQC returns flags/failures on some of the datasets. I’ll share the history.
Henk
If FastQC doesn’t think the data is “active” that means the input reads are in a deleted/purged state. Odd. Maybe some data was accidentally removed when pruning the history earlier? A share link to that history would be great.
Dear Jennaj,
Uploaded the fastq files to a new history on the usegalaxy.org server and managed to rerun the history that previously failed to construct a workflow. This works now properly. Only setback is the FeatureCounts tool: There are no built-in genomes to select. My fastq files are mapped against hg38 using HISAT2.
Hi @Henk
Do the BAMs inside that collection have the database hg38 assigned to them? That is needed for the tool to “detect” that the build-in index for the paired gene annotation is available.
If you mapped in Galaxy, then hg38 would be assigned by the HISAT2 tools. If you uploaded the data, then it might need to still be assigned by you, after confirming that the versions of hg38 are exact. The alternative is to set up a custom database key to match your distinct version of hg38, then to also get the annotation that fits it. Details are in here → FAQ: Extended Help for Differential Expression Analysis Tools
How to check: Click into collection 262 and expand the BAMs to confirm the database assignment. If you need to add it in for some reason, click into the pencil icon for the collection and make the database assignment in batch.
Hi Jennifer,
My BAM files already had hg38 assigned. To my surprise the built-in gene annotation appeared after I changed the ‘alignment file’ from dataset collection to single dataset and back, yesterday. 7 out of 8 input files have now been aligned.
Hi @Henk
It sounds like the form needed a reset to “see” the data. I’m going to try to reproduce it. Maybe this can be mitigated better.
Glad this worked out, and thanks for posting back the solution!
Update: I wasn’t able to reproduce the issue, even after the recent release that was just processed. Hopefully this has resolved for you as well!