Hi! I am working with my organization to set up a Linux virtual machine (RedHat) to run a local Galaxy instance. Our IT department will make sure the required Python is installed. This VM will run behind a firewall and they’d need to whitelist any access to the outside world. In order to install the Galaxy instance and tools, I was wondering which sites I would need to request to be white listed. I have Github and the toolshed site on my list. Do you happen to know which other sites I’ll need to include? Thank you so much and I’m sorry if this is a silly question.
This is a good question, and you’ll have some decisions to make.
Github and the ToolShed are good places to start (for administration).
Public download sites:
You will probably also want to allow download access to any of the
Get Data tools that you plan to enable. Also keep in mind that people often download data from NCBI, SRA, UCSC, and similar sites using the
Upload tool with URLs. And if you plan to use Data Managers, wherever they are sourcing data would need to be accessible.
Public upload sites:
You’ll also need to consider if you plan on allowing data to be displayed or used externally. Examples: viewing data in genome browsers hosted publically or moving data to public Galaxy servers. This wouldn’t be recommended if the data needs to be protected for privacy and related reasons.
Hope that helps! I’ve also asked our admin team to review and make recommendations, so you may get another reply
As @jennaj said, that list is quite large and may change without warning in a way that is beyond our control. Are you sure your organization needs to whitelist IPs / URLs, or just ports ? The ports would be standard HTTP and HTTPS for the most part. Some tools may require FTP or more exotic protocols, but that’s not common.
If you do need to whitelist IPs or URLs it might be easiest if you set the VM up with internet access, then restrict it. Installing repositories from the tool shed also requires access to anaconda.org if you are planning to use Conda dependencies, if you’re planning to use docker or singularity for dependencies these will require access to https://quay.io/.
Thank you all so much! This has been really helpful. I am planning to use the ARTIC workflow on Galaxy to process SARS-CoV-2 data so I had to also include the Galaxy project site, auspice, nextstrain/nextclade, GISAID/Epicov, ENA, and the NCBI sites. I am sure I’ll run into more problems with the firewall but this is a great start.