Sharing is CAIRing: Characterizing Principles and Assessing Properties of Universal Privacy Evaluation for Synthetic Tabular Data
CoRR(2023)
摘要
Data sharing is a necessity for innovative progress in many domains,
especially in healthcare. However, the ability to share data is hindered by
regulations protecting the privacy of natural persons. Synthetic tabular data
provide a promising solution to address data sharing difficulties but does not
inherently guarantee privacy. Still, there is a lack of agreement on
appropriate methods for assessing the privacy-preserving capabilities of
synthetic data, making it difficult to compare results across studies. To the
best of our knowledge, this is the first work to identify properties that
constitute good universal privacy evaluation metrics for synthetic tabular
data. The goal of such metrics is to enable comparability across studies and to
allow non-technical stakeholders to understand how privacy is protected. We
identify four principles for the assessment of metrics: Comparability,
Applicability, Interpretability, and Representativeness (CAIR). To quantify and
rank the degree to which evaluation metrics conform to the CAIR principles, we
design a rubric using a scale of 1-4. Each of the four properties is scored on
four parameters, yielding 16 total dimensions. We study the applicability and
usefulness of the CAIR principles and rubric by assessing a selection of
metrics popular in other studies. The results provide granular insights into
the strengths and weaknesses of existing metrics that not only rank the metrics
but highlight areas of potential improvements. We expect that the CAIR
principles will foster agreement among researchers and organizations on which
universal privacy evaluation metrics are appropriate for synthetic tabular
data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要