Dynamic Load Balancing For Ordered Data-Parallel Regions In Distributed Streaming Systems

MIDDLEWARE(2016)

引用 13|浏览144
暂无评分
摘要
Distributed stream computing has emerged as a technology that can satisfy the low latency, high throughput demands of big data. Stream computing naturally exposes pipeline, task and data parallelism. Meeting the throughput and latency demands of online big data requires exploiting such parallelism across heterogeneous clusters. When a single job is running on a homogeneous cluster, load balancing is important. When multiple jobs are running across a heterogeneous cluster, load balancing becomes critical. The data parallel regions of distributed streaming applications are particularly sensitive to load imbalance, as their overall speed is gated by the slowest performer. We propose a dynamic load balancing technique based on a system artifact: the TCP blocking rate per connection. We build a function for each connection based on this blocking rate, and obtain a balanced load distribution by modeling the problem as a minimax separable resource allocation problem. In other words, we minimize the maximum value of these functions. Our model achieves local load balancing that does not require any global information. We test our model in a real streaming system, and demonstrate that it is able to detect differences in node capacities, determine the correct load distribution for those capacities and dynamically adapt to changes in the system.
更多
查看译文
关键词
stream processing,automatic parallelization,dynamic load balancing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要