Synthetic Data Generation for Enterprise DBMS.

ICDE（2023）

引用 0|浏览9

暂无评分

摘要

A critical need for enterprise DBMS vendors is to generate synthetic databases for testing their engines and applications in a range of environments. These synthetic databases are targeted toward capturing the desired schematic properties, and the statistical profiles of the data hosted on these schemas.Several data generation frameworks have been proposed for OLAP over the past three decades. The early efforts focused on ab initio generation based on standard mathematical distributions. Subsequently, there was a shift to database-dependent regeneration, which aims to create a database with similar statistical properties to a specific client database. This client-specific perspective has been taken further in recent times through workload-dependent database regeneration, where the databases generated ensure similar query executions to those observed at the client site.In this tutorial, we present a holistic coverage of synthetic data generation, highlighting the strengths and limitations of the above-mentioned framework classes. At the end, a suite of open technical problems and future research directions are enumerated.

查看译文

关键词

Synthetic Data Generation,DBMS Testing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要