Much depends on how users are doing work, or in other words - when the master server(s) and worker nodes need to be accessible and if it matters or not if user accounts are persistent over time.
Work will be done during a single or series of discrete time frames, with non-persistent accounts.
–OR–
Work will be done asynchronously over longer time frames, throughout the class week/semester/quarter/etc, with persistent accounts.
There are a few variables to play around with. In short:
- Some “initial” image/storage bucket – preconfigured with tools/data the class will be using. There could be one or more of these. Keep these as small as possible.
- Master nodes/buckets – launched from the initial image(s)/bucket(s), kept active or brought up/down or completely discarded, as appropriate.
- Worker nodes – dedicated or on-demand or both, tuned as needed, per master node.
Starting up a master node or multiple master clone nodes, based on a baseline pre-configured saved image with one worker node per user allocated, or possibly 2+ users per worker node, depending on acceptable wait time (sometimes I pair users to work together), with ~10 nodes per master, has worked well in the past for me and others. The idea is to bring it up master nodes when in use and take them down when not in use. Discard after completely done with them.
The pre-configured image can be saved for reuse (tools and limited “starting” data – such as indexes, small staged libraries of training data, shared objects like histories/workflows) which incurs ongoing costs or at the end delete everything completely (the image itself plus any data buckets attached). It is important to note that only after shutting down and removing all resources from AWS are costs no longer incurred.
Leaving one or more master nodes continuously active will cost more.
The number of dedicated or on-demand worker nodes per master node can be tuned at any time to help manage costs. Small warning – on-demand worker nodes can get expensive but might be appropriate for some cases or time frames. It just depends on how important it is to get work done quickly versus some acceptable wait time.
Also, keep in mind that master nodes are independent servers, so accounts and user data (like histories) are not common/shared between them, apart from what was already in the original image each was created from.
Hope that helps!