Verify, And Then Trust: Data Inconsistency Detection in ZooKeeper

Sushant Mane, Fangmin Lyu,Benjamin Reed

PaPoC '23: Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data(2023)

引用 0|浏览2
暂无评分
摘要
ZooKeeper masks crash failure of servers to provide a highly available, distributed coordination kernel; however, in production, not all failures are crash failures. Bugs in underlying software systems and hardware can corrupt the ZooKeeper replicas, leading to data loss. Since ZooKeeper is used as a 'source of truth' for mission-critical applications, it essential to detect data inconsistencies caused by arbitrary faults to safeguard reliability. Byzantine Fault Tolerance (BFT) promises to handle these problems. However, these protocols are expensive in important dimensions: development, deployment, complexity, and performance. ZooKeeper takes an alternative approach that focuses on detecting faulty behavior rather than tolerating it and thus providing improved reliability without paying the full expense of BFT protocols. This paper describes various techniques used for detecting data inconsistencies in ZooKeeper. We also analyzed the impact of using these techniques on the reliability and performance of the overall system. Our evaluation shows that a real-time digest-based fault detection technique can be employed in production to provide improved reliability with a minimal performance penalty and no additional operational cost. We hope that our analysis and evaluation can help guide the design of next-generation primary-backup systems aiming to provide high reliability.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要