DIRAQ: scalable in situ data- and resource-aware indexing for optimized query performance

Cluster Computing(2014)

引用 11|浏览96
暂无评分
摘要
Scientific data analytics in high-performance computing environments has been evolving along with the advancement of computing capabilities. With the onset of exascale computing, the increasing gap between compute performance and I/O bandwidth has rendered the traditional post-simulation processing a tedious process. Despite the challenges due to increased data production, there exists an opportunity to benefit from “cheap” computing power to perform query-driven exploration and visualization during simulation time. To accelerate such analyses, applications traditionally augment, post-simulation, raw data with large indexes, which are then repeatedly utilized for data exploration. However, the generation of current state-of-the-art indexes involves a compute- and memory-intensive processing, thus rendering them inapplicable in an in situ context. In this paper we propose DIRAQ , a parallel in situ , in network data encoding and reorganization technique that enables the transformation of simulation output into a query-efficient form, with negligible runtime overhead to the simulation run. DIRAQ ’s effective core-local, precision-based encoding approach incorporates an embedded compressed index that is 3–6 × smaller than current state-of-the-art indexing schemes. Its data-aware index adjustmentation improves performance of group-level index layout creation by up to 35
更多
查看译文
关键词
Exascale computing,Indexing,Query processing,Compression
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要