cleanup script does not purge

Dear galaxy supportteam,
I have questions regarding the cleanup of datasets, libraries and histories using the provided scripts.
I have a local galaxy (19.05) instance working with postgresql and after making some libraries am now trying to cleanup my libraries and all the datasets involved. I first deleted the libraries in the galaxy browser and the histories with associated datasets. After that, I ran the root/scripts/cleanup_datasets/cleanup_datasets.py script with -r for disk removal (later also with -f to force retrying) and -4 for library removal, which shows std output information that all libraries were purged and associated datasets marked as deleted. However, in the browser I can still undo the deletion of all these libraries and use them consequently to add files. I tried all kinds of combinations with the script and also ran the sh wrapped scripts that go with it. Some of the datasets in database/files are removed, but many subdirectories still remain filled with datsets.

I was wondering whether you have seen this behaviour before, or if I am missing a crucial step in the process.

Thank you !!
Annabel

1 Like

So this is expected, Galaxy never really deletes rows from database, so the library object will always be there.

But this is a problem, are you sure those datasets have no ‘links’? i.e. no history dataset that uses it?

Thanks for your quick reply! To my knowledge, all histories were removed prior to cleaning up of the libraries, but I will dive into this! Anyhow, I would expect the cleanup_dataset.py -6 argument to take care of these associative ‘links’, is that true or did I misunderstand these links?

The script is designed to ‘clean up’, i.e. it will operate on data that users marked deleted, it won’t delete them under their hands. So I think it is entirely possible that there are users using the dataset in question (e.g. in histories) hence the script refuses to remove the physical file.

I recommend to verify this by finding the problematic dataset id and querying the database for any hda or ldda that use it.

this script is helpful for Galaxy db querying

this is an alternative cleanup script that operates directly on postgres and offers slightly different features

1 Like

I also suggest to switch to gp_cleanup.py which is much faster.

Note that there were some very recent bugfixes regarding purging:


I guess the will be in 19.09.

Any infos on data sets that you expect to be deleted but that are not deleted are welcome. I guess this an issue on https://github.com/galaxyproject/galaxy would be a good place. Or just ask on https://gitter.im/galaxyproject/admins

2 Likes