ST4ML: Machine Learning Oriented Spatio-Temporal Data Processing at Scale.

Proc. ACM Manag. Data(2023)

引用 0|浏览41
暂无评分
摘要
Data scientists and researchers utilize enormous spatio-temporal data and build machine learning models to solve practical problems in diverse domains including intelligent transportation, urban planning, epidemic prediction, and many more. Extracting application-specific features from big spatio-temporal data poses system requirements of heterogeneous data support, efficient and scalable computing over spatial and temporal dimensions, as well as a user-friendly programming interface. This paper presents ST4ML, a distributed spatio-temporal data processing system to support scalable machine-learning-oriented applications. We propose a three-stage pipelining computing framework, namely "selection-conversion-extraction" to abstract the distributed computing flow and implement it based on Apache Spark. To the best of our knowledge, ST4ML is the first of its kind to realize our design considerations. Extensive experiments with real-world datasets evidence that ST4ML outperforms straightforward extensions of existing ST data processing systems by up to an order of magnitude. ST4ML is open-sourced at https://github.com/Panrong/st4ml.
更多
查看译文
关键词
processing,machine learning,spatio-temporal
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要