PragFormer: Data-driven Parallel Source Code Classification with Transformers

Research Square (Research Square)(2023)

引用 0|浏览13
暂无评分
摘要
Abstract Multi-core shared memory architectures have become ubiquitous in computing hardware nowadays. As a result, there is a growing need to fully utilize these architectures by introducing appropriate parallelization schemes, such as OpenMP worksharing-loop constructs, to applications. However, most developers find introducing OpenMP directives to their code hard due to pervasive pitfalls in managing parallel shared memory. To assist developers in this process, many compilers, as well as source-to-source (S2S) translation tools, have been developed over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. Recently, many data-driven AI-based code completion (CC) tools, such as GitHub CoPilot, have been developed to ease and improve programming productivity. Leveraging the insights from existing AI-based programming-assistance tools, this work presents a novel AI model that can serve as a parallel-programming assistant ∗ . Specifically, our model, named PragFormer, is tasked with identifying for loops that can benefit from conversion to parallel worksharing-loop construct (OpenMP directive) and even predict the need for specific data-sharing attributes clauses on the fly. We created a unique database, named Open-OMP, specifically for this goal. Open-OMP contains over 32,000 unique code snippets from different domains, half of which contain OpenMP directives, while the other half do not. We experimented with different model design parameters for these tasks and showed that our best-performing model outperforms a statistically-trained baseline as well as a state-of-the-art S2S compiler. In fact, it even outperforms the popular generative AI model of ChatGPT. In the spirit of advancing research on this topic, we have already released source code for PragFormer as well as Open-OMP dataset to public † . Moreover, an interactive demo of our tool, as well as a Hugging Face webpage ‡ to experiment with our tool, are already available.
更多
查看译文
关键词
parallel source code classification,data-driven
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要