University of Massachusetts, Amherst, United States of America
While cloud platforms enable users to rent computing resources on demand to execute their jobs, buying fixed resources is still much cheaper than renting if utilization is high. Optimizing cloud costs requires users to determine how many fixed resources to buy versus rent based on their workload. In this paper, we introduce the concept of a waiting policy for cloud-enabled schedulers, the dual of a scheduling policy, and show that the optimal cost depends on this policy. We define multiple waiting policies and develop simple analytical models to reveal the trade-off among resource provisioning, cost and job waiting time. We evaluate the impact of these waiting policies on a year-long production batch workload consisting of 14m jobs run on a 14.3k-core cluster, and show that a compound waiting policy decreases the cost (by 5%) and the mean job waiting time (by 7x) compared to the current cluster.