Performing Work Efficiently in the Presence of Faults

PODC(1998)

引用 107|浏览2
暂无评分
摘要
We consider a system of t synchronous processes that communicate only by sending messages to one another, and together the processes must perform n independent units of work. Processes may fail by crashing; we want to guarantee that in every execution of the protocol in which at least one process survives, all n units of work will be performed. We consider three parameters: the number of messages sent, the total number of units of work performed (including multiplicities), and time. We present three protocols for solving the problem. All three are work optimal, doing O(n+t) work. The first has moderate costs in the remaining two parameters, sends $O(t\sqrt{t})$ messages, and takes O(n+t) time. This protocol can be easily modified to run in any completely asynchronous system equipped with a failure detection mechanism. The second sends only O(t log t) messages, but its running time is large (O(t2(n+t) 2n+t)). The third is essentially time optimal in the (usual) case in which there are no failures, and its time complexity degrades gracefully as the number of failures increases.
更多
查看译文
关键词
moderate cost,n unit,total number,time optimal,failures increase,n independent unit,asynchronous system,work efficiently,work optimal,failure detection mechanism,time complexity,load balancing,fault tolerance,distributed systems,work
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要