Hector: An Efficient Programming and Compilation Framework for Implementing Relational Graph Neural Networks in GPU Architectures
Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3(2023)
摘要
Relational graph neural networks (RGNNs) are graph neural networks with
dedicated structures for modeling the different types of nodes and edges in
heterogeneous graphs. While RGNNs have been increasingly adopted in many
real-world applications due to their versatility and accuracy, they pose
performance and system design challenges: inherent memory-intensive computation
patterns, the gap between the programming interface and kernel APIs, and heavy
programming effort in optimizing kernels caused by their coupling with data
layout and heterogeneity. To systematically address these challenges, we
propose Hector, a novel two-level intermediate representation and its code
generator framework, that (a) captures the key properties of RGNN models, and
opportunities to reduce memory accesses in inter-operator scheduling and
materialization, (b) generates code with flexible data access scheme to
eliminate redundant data copies, (c) decouples model semantics, data layout,
and operators-specific optimization from each other to reduce programming
effort. By building on one general matrix multiply (GEMM) template and a
node/edge traversal template, Hector achieves up to 9.9x speed-up in inference
and 43.7x speed-up in training compared with the state-of-the-art public
systems on select models, i.e., RGCN, RGAT and HGT, when running heterogeneous
graphs provided by Deep Graph Library (DGL) and Open Graph Benchmark (OGB). In
addition, Hector does not trigger any out-of-memory (OOM) exception in these
tests. We also propose the linear operator reorder and compact materialization
to further accelerate the system by up to 3.8x. As an indicator of programming
effort reduction, Hector takes in 51 lines of code expressing the three models
and generates a total of 8K lines of CUDA and C++ code.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要