Automating And Optimizing Data Transfers For Many-Core Coprocessors

ICS(2014)

引用 0|浏览79
暂无评分
摘要
Orchestrating data transfers between CPUs and a coprocessor manually is cumbersome, particularly for multi-dimensional arrays and other data structures with multi-level pointers, which are common in scientific computations. This work describes a system that includes both compile-time and runtime solutions for this problem, with the overarching goal of improving programmer productivity while maintaining performance.We implemented our best compile-time solution, partial linearization with pointer reset, as a source-to-source transformation, and evaluated our work by multiple C benchmarks. Our experiment results demonstrate that our best compile-time solution can perform 2.5x-5x faster than original runtime solution, and the CPU-Coprocessor code with it can achieve 1.5x-2.5x speedup over the 16-thread CPU version.
更多
查看译文
关键词
Coprocessors,Static Analysis,Runtime Analysis,Offloading
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要