Don't forget about synchronization! Guidelines for using locks on graphics processing units

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE(2022)

引用 2|浏览85
暂无评分
摘要
Heterogeneous devices are becoming necessary components of high performance computing infrastructures, and the graphics processing unit (GPU) plays an important role in this landscape. Given a problem, the established approach for exploiting the GPU is to design solutions that are parallel, without data dependencies. These solutions are then offloaded to the GPU's massively parallel capability. This design principle often leads to developing applications that cannot maximize GPU hardware utilization. The goal of this article is to challenge this common belief by empirically showing that allowing even simple forms of synchronization enables programmers to design solutions that admit conflicts and achieve better performance. Our experience shows that lock-based solutions to the k-means clustering problem, implemented using two well-known locking strategies, outperform the well-engineered and parallel KMCUDA on both synthetic and real datasets; with an average 8x faster runtimes across all locking algorithms on a synthetic dataset and 1.7x faster on a real world dataset across all locking algorithms (and max speedups of 71.3x and 2.75x, respectively). We validate these results using a more sophisticated clustering algorithm, namely fuzzy c-means and summarize our findings by identifying three guidelines to help make concurrency effective when programming GPU applications.
更多
查看译文
关键词
concurrency, GPU, k-means, synchronization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要