COMP: Compiler Optimizations for Manycore Processors

MICRO(2014)

引用 16|浏览124
暂无评分
摘要
Applications executing on multicore processors can now easily offload computations to many core processors, such as Intel Xeon Phi coprocessors. However, it requires high levels of expertise and effort to tune such offloaded applications to realize high-performance execution. Previous efforts have focused on optimizing the execution of offloaded computations on many core processors. However, we observe that the data transfer overhead between multicore and many core processors, and the limited device memories of many core processors often constrain the performance gains that are possible by offloading computations. In this paper, we present three source-to-source compiler optimizations that can significantly improve the performance of applications that offload computations to many core processors. The first optimization automatically transforms offloaded codes to enable data streaming, which overlaps data transfer between multicore and many core processors with computations on these processors to hide data transfer overhead. This optimization is also designed to minimize the memory usage on many core processors, while achieving the optimal performance. The second compiler optimization re-orders computations to regularize irregular memory accesses. It enables data streaming and factorization on many core processors, even when the memory access patterns in the original source codes are irregular. Finally, our new shared memory mechanism provides efficient support for transferring large pointer-based data structures between hosts and many core processors. Our evaluation shows that the proposed compiler optimizations benefit 9 out of 12 benchmarks. Compared with simply offloading the original parallel implementations of these benchmarks, we can achieve 1.16x-52.21x speedups.
更多
查看译文
关键词
offload,data streaming,compiler optimization for manycore processors,pointer-based data structures,storage management,vectorization,data structures,memory access patterns,source code (software),shared memory mechanism,manycore coprocessors,data transfer overhead hiding,offloaded codes,shared memory systems,memory usage minimization,compiler optimizations,optimising compilers,source-to-source compiler optimizations,multicore processors,intel mic,benchmark testing,coprocessors,data transfer,multicore processing,resource utilization,optimization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要