How to delete files automatically during a workflow?

Is there anyway to set up a workflow so that files generated by a previous module used as input for a second tool are deleted after the second tool is done with them?

For example, I generate a fastq file with CutAdapt. Then I compress it to fastq.gz with Compress Files and have the input fastq files deleted permanently.

I’ve also read about scratch disks. How do I transfer files between that and permanent storage, if I run my workflow on the scratch disks?

Thanks

Hi @billy.l
Galaxy workflows can remove outputs that are not used for subsequent steps, but I don’t know if workflows support deletion of intermediate datasets used as inputs for subsequent steps. Maybe consider adding a non-propagating tag to intermediate datasets you don’t need. After completion of workflows filter datasets on the tag using advanced search/filter option, select all datasets with tags and delete the files. There must be a better option, but this should work, too.

Not sure what do you mean by scratch disks. Tools in Galaxy (often) use dedicated storage for temporary data. Users do not have access to this storage or temporary working files and it is outside of the user quota. After completion of a job the output files go to the standard(?) storage. Users do not have control over job destinations, storage used for temporary files etc. This is responsibility of server admins. I hope I not misunderstood your question.

Kind regards,
Igor

Thanks @igor. I guess I’ll have to make do with tags for now.

By scratch disks I meant the 1TB scratch storage source that appears below my used storage on the storage usage page. I’m wondering if I can send some of the intermediate files generated from my workflow to that storage

Hi @billy.l
Thank you for the additional info. It looks like this feature (quota for scratch storage) is unique to the ORG server. Galaxy Europe and Galaxy Australia do not show scratch storage. I guess, the scratch storage refers to storage used for temporary output files from active jobs. To the best of my knowledge, users cannot manage the scratch storage on Galaxy, and users cannot keep outputs of completed jobs on the scratch storage.
Kind regards,
Igor

Hi @billy.l
It seems I was wrong about the scratch disk. The ORG server provides a temporary storage increase for 30 days. Try User (the top Galaxy menu) > Preferences > Storage location.
Hope that helps.
Kind regards,
Igor

1 Like