The pochoir stencil compiler

Yuan Tang,Rezaul Alam Chowdhury,Bradley C. Kuszmaul,Chi-Keung Luk,Charles E. Leiserson

SPAA（2011）

引用 449|浏览350

暂无评分

摘要

A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modern multicore machine demonstrates that Pochoir outperforms standard parallelloop implementations, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making "hyperspace" cuts, which yield asymptotically more parallelism for the same cache efficiency.

查看译文

关键词

pochoir compiler,domain-specific stencil language,pochoir system,parallel cache-efficient stencil,stencil benchmarks,stencil computation,pochoir stencil compiler,efficient parallel cache-oblivious algorithm,stencil specification,general d-dimensional stencil,c,compiler,cache oblivious algorithm,parallel computation,multicore

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要