Hi, I am trying to install Galaxy and to navigate the wealth of documentation about this. I am struggling to understand the role and use of the proxy server; having a proxy server is “strongly recommended”. There are two specific points.
Proxying Galaxy with Apache — Galaxy Project 23.0.2.dev0 documentation talks about running “sh run.sh” on an Apache proxy server. Can someone explain why this is necessary - because I’d understand the purpose of the Apache proxy server as being to serve static content only? If ‘run.sh’ has to be run on the proxy server, how much else has to be installed there?
I cannot understand the detail of duplicating some of the other Galaxy installation on the Apache proxy server. This comprises the full subdirectory tree rooted at ‘static’ and a partial subdirectory tree rooted at ‘config’. Is it OK to select just the ‘config/plugins///static’ paths, or will others be needed too? Will it be OK or desirable to delete those paths from the Galaxy server? My feeling is that this is a really clunky way of doing an installation - splitting it over two systems in a way that requires a lot of manual examination of paths.
Those are the production environment docs. The idea is to separate where the different parts of Galaxy are running from, in order to maximize the performance of each. Sort of how actually running jobs is broken out to use a cluster instead of the main mode. Or, how data storage can be offloaded to other specific places where you happen to have available disc with a high transfer rates.
At the most basic level for a proxy, what matters most: more expected concurrent users generally translates into a need for a more performative web-server host (…or hosts).
If the server is for personal use, or a small number of non-concurrent users, probably not needed as far as I know. But you can double check with the admins at thier chat about the full considerations.
Keep in mind that you could choose a configuration that is already packaged, like the Docker Galaxy. I’ll add a tag to this post that will link to other posts at this forum with the links and some general troubleshooting → where to source plus the deployment specific administrative docs and chat.
I’m not sure I understand. Do you mean leaving out some of the paths in the proxy config? Or removing duplicated paths from the host server that are represented by the proxy? Static is relatively small. Config is required everywhere. I would follow the instructions exactly – even if only to make upgrading over time using current methods easier.
Those are the production environment docs - It is indeed a production environment, potentially with a large number of concurrent users, that I’m trying to create.
Docker Galaxy - I guess the Docker version of Galaxy would still need to work with a proxy server, so I’m not clear how that would help resolve my lack of understanding.
All options - The link you posted seems to cover many different deployment types - but the one I’m after is “local”, which simply refers back to the documentation I’ve hit a brick wall with.
Do you mean… - I meant: because I need to duplicate those paths on the proxy server, can/should I delete them on the Galaxy server?
Rereading the documentation, I wonder whether “server” is meant as “server software” rather than “server machine”, and Apache/NGINX should actually be collocated with Galaxy on the same machine. The instructions may then make more sense. Is that the intention of the documentation?
I don’t have much option other than to try without a proxy, given that the instructions for setting up a proxy do not make sense. I’ve hit a brick wall with trying to collacate Apache with Galaxy too. So I’m reduced to trying to run with Galaxy’s own web server.
I have to say that the idea of splitting workload over two machines for performance reasons makes sense only when you don’t have control over the capacity of your machines. When you do have such control (for example, when you are installing to a virtual machine) you can size a single machine to cope with any workload. If you are splitting the workload for other reasons, such as resilience or security, that’s a different matter. But I don’t get the impression that the workload split is suggested for those reasons.
This in some cases applies. Two or more proxies, to reduce potential downtime.
I think this is part of it too and don’t know the full details except that you can set up several levels of what an “admin” is. Example: restrict “ssh” permissions to certain servers hosting certain components. Outside of Galaxy at a higher level – different but could overlap with whoever has admin permissions in the UI.