Right-Sizing Server Capacity Headroom for Global Online Services

2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS)(2018)

引用 2|浏览57
暂无评分
摘要
We present a capacity planning case study showing a significant opportunity for improving the utilization of a large, low-latency, highly available online service containing 100K+ servers spanning 9 geographic regions. Analyzing 30 PB of traces over 90 days we devised a new iterative black-box capacity planning model using the discovered relationships between workload, utilization, and quality. We verified the model on 1,000s of servers showing capacity reductions between 20% and 40% with effectively no impact on workload latency, availability, or the capacity required for disaster recovery. These results are confirmed experimentally by shrinking production server pools to cause the remaining servers to run at higher utilization, and using data from real-world large scale unplanned failures. Finally, we show examples of using our model for offline regression analysis to detect critical issues before their deployment.
更多
查看译文
关键词
micro service,capacity planning,data center,distributed systems,capacity management,optimization,resource usage,cloud computing,server pool,online service
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要