Privacy-Enhanced Database Synthesis for Benchmark Publishing
arxiv(2024)
摘要
Benchmarking is crucial for evaluating a DBMS, yet existing benchmarks often
fail to reflect the varied nature of user workloads. As a result, there is
increasing momentum toward creating databases that incorporate real-world user
data to more accurately mirror business environments. However, privacy concerns
deter users from directly sharing their data, underscoring the importance of
creating synthesized databases for benchmarking that also prioritize privacy
protection. Differential privacy has become a key method for safeguarding
privacy when sharing data, but the focus has largely been on minimizing errors
in aggregate queries or classification tasks, with less attention given to
benchmarking factors like runtime performance. This paper delves into the
creation of privacy-preserving databases specifically for benchmarking, aiming
to produce a differentially private database whose query performance closely
resembles that of the original data. Introducing PrivBench, an innovative
synthesis framework, we support the generation of high-quality data that
maintains privacy. PrivBench uses sum-product networks (SPNs) to partition and
sample data, enhancing data representation while securing privacy. The
framework allows users to adjust the detail of SPN partitions and privacy
settings, crucial for customizing privacy levels. We validate our approach,
which uses the Laplace and exponential mechanisms, in maintaining privacy. Our
tests show that PrivBench effectively generates data that maintains privacy and
excels in query performance, consistently reducing errors in query execution
time, query cardinality, and KL divergence.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要