Improving MapReduce Performance in Heterogeneous Environments.

OSDI'08: Proceedings of the 8th USENIX conference on Operating systems design and implementation(2008)

引用 2437|浏览398
暂无评分
摘要
MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-source implementation of MapReduce enjoying wide adoption and is often used for short jobs where low response time is critical. Hadoop's performance is closely tied to its task scheduler, which implicitly assumes that cluster nodes are homogeneous and tasks make progress linearly, and uses these assumptions to decide when to speculatively re-execute tasks that appear to be stragglers. In practice, the homogeneity assumptions do not always hold. An especially compelling setting where this occurs is a virtualized data center, such as Amazon's Elastic Compute Cloud (EC2). We show that Hadoop's scheduler can cause severe performance degradation in heterogeneous environments. We design a new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity. LATE can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.
更多
查看译文
关键词
Hadoop response time,data mining,low response time,severe performance degradation,task scheduler,virtualized data center,Elastic Compute Cloud,Longest Approximate Time,cluster node,compelling setting,Improving MapReduce performance,heterogeneous environment
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要