Ursprung: Provenance for Large-Scale Analytics Environments

Proceedings of the 2019 International Conference on Management of Data(2019)

引用 0|浏览36
暂无评分
摘要
Modern analytics has produced wonders, but reproducing and verifying these wonders is difficult. Data provenance helps to solve this problem by collecting information on how data is created and accessed. Although provenance collection techniques have been used successfully on a smaller scale, tracking provenance in large-scale analytics environments is challenging due to the scale of provenance generated and the heterogeneous domains. Without provenance, analysts struggle to keep track of and reproduce their analyses. We demonstrate Ursprung, a provenance collection system specifically targeted at such environments. Ursprung transparently collects the minimal set of system-level provenance required to track the relationships between data and processes. To collect domain specific provenance, Usprung enables users to specify capture rules to curate application-specific logs, intermediate results etc. To reduce storage overhead and accelerate queries, it uses event hierarchies to synthesize raw provenance into compact summaries.
更多
查看译文
关键词
analytics, data provenance, metadata management
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要