Extract workflow with deleted intermediate steps: Freeing up quota space and saving prior work

I am relativerly new to Galaxy, and i am sorry if this is a very basic question. My main question is what is the intended way to keep permanent record of what has been done to what files in complicated analysis.

I have run analysis and now is the time to delete the bulky files and make space for another analysis. Is the intended method for keeping record of tools and parameters just the “history”? My history is relatively messy, because I tried several options of a tool before settling on final analysis. My history also contains intermediate steps of large files i had to delete for space reasons. I thought I could extract workflow from history and cleanup the abandoned branches and keep the workflow as record of the analysis. However, the intermediate deleted files/steps are not part of extracted workflow. Is it possible to extract workflow including deleted intermediate files/steps?

1 Like

Hi @BagiM,
what is your goal for maintaining the stages of the analysis that are not part of the final workflow?

Regards

1 Like

Thanks for the response. I probably wasn’t very clear.
The analysis steps look like this:
sample1_file1 + sample1_file2 >step1> sample1_file3 >step2> sample1_file4 >step3> sample1_file5
repeat steps 1-3 for all samples.
combine file5 of all samples into one large file6
run analysis on file6

steps 1-3 are simple operations like combining forward and reverse reads, trimming reads, converting fastq to fasta, adding prefix to read names, sampling reads. For space reasons once i got the file6 i needed to delete file3 and file5 for all samples. Then I run the final analysis on file6.
If I now try to extract workflow the intermediate steps resulting in file3 and file5 are not present and the workflow is not continuous string of steps.

1 Like

Hi @BagiM

Please review this FAQ for help about managing data, freeing up quota space, and saving prior analysis work: Account quotas

For the extracted workflow problems, it sounds like you will need to directly edit the workflow.

Workflows are a record of the tools/actions used to create data. Workflows do not contain data themselves. If you deleted intermediate datasets, the tools/actions that created them might not have been extracted, or possibly may just have disconnected tools/actions within the workflow.

Tips:

  1. Workflows can be created “from scratch” or extracted. Sometimes a bit of tuning in the workflow editor is needed when doing either until it runs as expected.
  2. You can reconnect tools or add back any tools/actions that are missing from the workflow by editing it.
  3. Intermediate datasets can be deleted while executing a workflow to save space. This is a per-tool “action”. Tool actions listed in the far right panel of the workflow editor when a tool is selected from the canvas.
  4. If you add or adjust workflow steps, you may need to disconnect/reconnect all the “noodles” between tools again to reset the workflow’s metadata. Reconnect starting from the inputs down through the analysis in the order of execution.
  5. Once this workflow is working as expected, try running it again with the original inputs all in one new clean history. That way you can create a “history archive” and download it, along with the associated workflow, for a complete record of your work. All of your datasets (that were not deleted while running the workflow!) will be inside the archive and can be used any way you want. You can even import that history archive + workflow into your own local Galaxy, to view the data in context, but I wouldn’t try running the workflow locally unless you set up a more complicated local Galaxy, or decide to use a cloud Galaxy.
  6. If you are an academic researcher, and need a bit more temporary quota space, most public Galaxy servers will grant that. Each handles requests a bit differently. Let us know if you cannot find where/how to request temporary quota space at the server you are working at (post back the server URL).
  7. Please see FAQ above with more details about most of this, plus the tutorials below for how to extract/edit workflows.

GTN tutorials that cover creating or editing a workflow are here. Scroll down past the analysis tutorials, the ones you will be most interested in are listed lower down with the “workflow” search term.

https://training.galaxyproject.org/training-material/search?query=workflow

Hope that helps!