Pipegen: Data Pipe Generator For Hybrid Analytics

SoCC '16: ACM Symposium on Cloud Computing Santa Clara CA USA October, 2016(2016)

引用 73|浏览362
暂无评分
摘要
As the number of big data management systems continues to grow, users increasingly seek to leverage multiple systems in the context of a single data analysis task. To efficiently support such hybrid analytics, we develop a tool called PipeGen for efficient data transfer between database management systems (DBMSs). PipeGen automatically generates data pipes between DBMSs by leveraging their functionality to transfer data via disk files using common data formats such as CSV. PipeGen creates data pipes by extending such functionality with efficient binary data transfer capabilities that avoid file system materialization, include multiple important format optimizations, and transfer data in parallel when possible. We evaluate our PipeGen prototype by generating 20 data pipes automatically between five different DBMSs. The results show that PipeGen speeds up data transfer by up to 3.8 x as compared to transferring using disk files.
更多
查看译文
关键词
Hybrid analytics,heterogeneous data transfer
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要