Tool A generates a secret key and encodes some content using that key. The content is exported as a dataset in the history
Tool B takes the encoded dataset from the history and decodes it using the same secret key as in 1.
Obviously, it should be impossible to get hold of the secret key as a normal Galaxy user.
Are there support for something like this in Galaxy? There is already the SECRET_KEY in the galaxy config file, but can this be securely transferred to the jobs? And how would something like this work in combination with job scheduling?
Thinking about this more closely, I don’t think the tool in any case can be trusted with the SECRET_KEY of the Galaxy server itself, as it might send it somewhere, so that is at least not a possibility.
I was thinking of this in as a way to validate that the dataset has not been tampered with between Tool A and Tool B. The full idea is that the main dataset itself would be stored unencrypted, but alongside an secret-key-encoded MD5 hash (or similar) of the dataset contents. Tool B could then decode the MD5 hash using the shared secret key and use this to validate that the dataset contents has not been tampered with.
Ahhh! So there is the possibility of storing dataset hashes inside Galaxy’s DB, in the dataset table. There has been some discussion of enabling this for everyone, it would be great to help move that forward.
Then you could maybe periodically scan the datasets, hash them, and check for tampering? Or write that in your tool script prefix, yeah, to pull the hash from the DB, and verify.
It might be an interesting development to add something like optional HMAC to the hashes in the dataset table, perhaps done with the galaxy secret_key? (But that suffers the same issue of where is the key material stored, if it’s the id_secret then it’s at risk and might need to be rotated/etc.)
Interesting. But I am not sure whether that would solve the same problem. My problem is not to check whether someone has infiltrated the system somehow and has the possibility to tamper with the dataset files. I happily delegate the responsibility of this to the Galaxy admins. My goal is to verify that the user has not uploaded a malicious dataset to the tool. More concretely, the tools in question stores data intermediately as Python pickle files, which is inherently insecure to use as input to tools.
That’s much easier to do, then you can just check that the datasets came from your a specific list of tools. I do something similar in my jbrowse tool to just print out which tool created a specific dataset. One second I’ll find some code.
Thanks a lot for this solution! However, would it not be possible for the user to spoof this by downloading a complete history (as a zip), tamper with the dataset, but keep the same tool metadata, and then upload the history again? Or are there security measures to disallow this?
ahh you’ve thought of everything yes, that would be an attack vector!
ok, then what you originally suggested is probably the best/only way to achieve that. Tool-A needs to generate a signature with some key material on your end (id secret? something else?), and Tool-B must validate it.
If you don’t trust the tools with the key material, then maybe you can do something of a tool prelogue/prologue that does the key generation/validation.
I guess the question is not whether I trust the tool, but whether YOU do! Meaning that support for sharing the ID_SECRET to tools can be misused by malicious tools being installed, independently of whether the misuse happens in the tool XML or in the dependencies. But I guess the tool XMLs are more carefully scrutinised (automatically/manually?)
In any case, I believe the user can get access to the full command line for a job, so sending a secret key as a command line parameter to the tool would then not be secure.
Security aside, is it possible to get ahold of the ID_SECRET or similar in the tool xml?
Ahh these are general purpose tools then for use on other servers? Scary!
I’d just say “this tool will look for the secret in /etc/tool-secret, it must be present or these tools won’t run”. That’s probably going to balance security with admins with ease of install/setup. And I’d hardcode the path where it must be found. Then the secret doesn’t appear on the CLI.
security aside, I believe so, but it’s discouraged. there’s app. available, and it’s probably somewhere under there, maybe app.configuration, I’m not entirely sure.
Yup, that is the problem. Apparently, the internally developed tools in question have external dependencies out of their control that make use of pickle files for intermediate storage.
Regarding the security of passing a key to the tool, I believe the wrapper could just store the key in a temp file for the tool to read instead of passing it on the command line (insecure) or hardcoding a path to a file (inflexible)?
In any case, could not the tool wrapper just read a secret key from a data table? Is that secure enough? For this to work, the Galaxy admin would manually need to create the .loc file with a secret key, or there could be a data manager tool for the admin to run that generates a random key and stores it in this file. As .loc files typically contains internal server paths (which are semi-sensitive), I would believe the .loc files are being securely managed by Galaxy already. Do you know if that is so?
.loc file and a DM would be a very galaxy way of doing things! For comparison the apollo tools I work on require some credentials in a hardcoded place hence my suggestion of that.
I believe the wrapper could just store the key in a temp file for the tool to read instead of passing it on the command line (insecure) or hardcoding a path to a file (inflexible)?
yeah, hence my suggestion of configfiles (which are essentially temp files that are easily available to a tool.)
.loc files, I think many of us treat this as something completely safe to expose, a lot of people have loc files on github, or backed up there at least. Many of us have stopped treating paths as something secret (see all of the ansible playbooks of ours online.) Main’s are all public in CVMFS, EUs are mostly public (it was not a priority to make them reproducible, we wanted to move to CVMFS.)
Wouldn’t this mean the user also needs to select the .loc entry in their tool interface to get it passed to the tool?
Yeah, I was thinking of that. Still, the question is not whether many .loc files out there are publicly available, but whether Galaxy exposes the contents of an internally generated .loc file that one wants to keep secure.
That might be so… I am not very experienced in tool wrapper development and have not used .loc files much, and I guess my scenario is in any case stretching the concept of a .loc file. I believe the main point of these files is to point towards reference datasets, which are not typically of a sensitive nature.
Actually, since app is available, I believe I don’t even need to get the SECRET_KEY extracted at all. Instead, I could just call app.security.encode_id(myfilehash) in the wrapper of tool A and store this in the output dataset somehow (e.g. in a YAML file), and in the tool B wrapper, I could extract this encoded string from the input dataset and call app.security.decode_id(myencodedfilehash) to get the original file hash, which I can then use to validate the file contents. Would that not be both secure and transparent enough? It would also not require any changes to the dependencies or the command line, nor would I need to store anything on the server.
@hxr Any thoughts about this solution? Would a tool wrapper for a tool that unpickles pickled files but includes this solution for validating the file contents be approved for installation on usegalaxy.*? @blankenberg?