Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs

About

Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate computation patterns. The execution of AT-GNN operations without kernel fusion results in heavy data movement and significant kernel launch overhead, while fixed thread scheduling in existing GNN kernel fusion strategies leads to sub-optimal performance, redundant computation and unbalanced workload. To address these challenges, we propose a dynamic kernel fusion framework, DF-GNN, for the AT-GNN family. DF-GNN introduces a dynamic bi-level thread scheduling strategy, enabling flexible adjustments to thread scheduling while retaining the benefits of shared memory within the fused kernel. DF-GNN tailors specific thread scheduling for operations in AT-GNNs and considers the performance bottleneck shift caused by the presence of super nodes. Additionally, DF-GNN is integrated with the PyTorch framework for high programmability. Evaluations across diverse GNN models and multiple datasets reveal that DF-GNN surpasses existing GNN kernel optimization works like cuGraph and dgNN, with speedups up to $7.0\times$ over the state-of-the-art non-fusion DGL sparse library. Moreover, it achieves an average speedup of $2.16\times$ in end-to-end training compared to the popular GNN computing framework DGL.

Jiahui Liu, Zhenkun Cai, Zhiyong Chen, Minjie Wang• 2024

Related benchmarks

TaskDatasetResultRank
Graph Transformer Efficiency BenchmarkingOgbn-arxiv
Latency Speedup (Forward)1.62
9
Graph Transformer Efficiency Benchmarkingcity-roads L
Latency Speedup (Forward)1.13
7
Graph Transformer Efficiency Benchmarkingcity-roads M
Latency Speedup (Forward)1.12
5
Graph Transformer Efficiency Benchmarkingartnet-exp
Latency Speedup (Forward Pass)1.19
3
Showing 4 of 4 rows

Other info

Follow for update