Dynamap: Schema Mapping Generation in the Wild

Proceedings of the 31st International Conference on Scientific and Statistical Database Management(2019)

引用 11|浏览365
暂无评分
摘要
Schema mappings enable declarative and executable specification of transformations between different schematic representations of application concepts. Most work on mapping generation has assumed that the source and target schemas are well defined, e.g., with declared keys and foreign keys, and that the mapping generation processes exist to support the data engineer in the labour-intensive process of producing a high-quality integration. However, organizations increasingly have access to numerous independently produced data sets, e.g., in a data lake, with a requirement to produce rapid, best-effort integrations, without extensive manual effort. This paper introduces Dynamap, a mapping generation algorithm for such settings, where metadata about sources and the relationships between them is derived from automated data profiling, and where there may be many alternative ways of combining source tables. Our contributions include a dynamic programming algorithm for exploring the space of potential mappings, and techniques for propagating profiling data through mappings, so that the fitness of candidate mappings can be estimated. Experimental results show the effectiveness and scalability of the approach in a variety of synthetic and real-world scenarios.
更多
查看译文
关键词
0000,1111
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要