Loop parallelization using dynamic commutativity analysis

CGO(2021)

引用 1|浏览20
暂无评分
摘要
ABSTRACTAutomatic parallelization has largely failed to keep its promise of extracting parallelism from sequential legacy code to maximize performance on multi-core systems outside the numerical domain. In this paper, we develop a novel dynamic commutativity analysis (DCA) for identifying parallelizable loops. Using commutativity instead of dependence tests, DCA avoids many of the overly strict data dependence constraints limiting existing parallelizing compilers. DCA extends the scope of automatic parallelization to uniformly include both regular array-based and irregular pointer-based codes. We have prototyped our novel parallelism detection analysis and evaluated it extensively against five state-of-the-art dependence-based techniques in two experimental settings. First, when applied to the NAS benchmarks which contain almost 1400 loops, DCA is able to identify as many parallel loops (over 1200) as the profile-guided dependence techniques and almost twice as many as all the static techniques combined. We then apply DCA to complex pointer-based loops, where it can successfully detect parallelism, while existing techniques fail to identify any. When combined with existing parallel code generation techniques, this results in an average speedup of 3.6× (and up to 55×) across the NAS benchmarks on a 72-core host, and up to 36.9× for the pointer-based loops, demonstrating the effectiveness of DCA in identifying profitable parallelism across a wide range of loops.
更多
查看译文
关键词
commutativity analysis, parallelization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要