LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs

About

The widespread application of Large Language Models (LLMs) has motivated a growing interest in their capacity for processing dynamic graphs. Temporal motifs, as an elementary unit and important local property of dynamic graphs which can directly reflect anomalies and unique phenomena, are essential for understanding their evolutionary dynamics and structural features. However, leveraging LLMs for temporal motif analysis on dynamic graphs remains relatively unexplored. In this paper, we systematically study LLM performance on temporal motif-related tasks. Specifically, we propose a comprehensive benchmark, LLMTM (Large Language Models in Temporal Motifs), which includes six tailored tasks across nine temporal motif types. We then conduct extensive experiments to analyze the impacts of different prompting techniques and LLMs (including nine models: openPangu-7B, the DeepSeek-R1-Distill-Qwen series, Qwen2.5-32B-Instruct, GPT-4o-mini, DeepSeek-R1, and o3) on model performance. Informed by our benchmark findings, we develop a tool-augmented LLM agent that leverages precisely engineered prompts to solve these tasks with high accuracy. Nevertheless, the high accuracy of the agent incurs a substantial cost. To address this trade-off, we propose a simple yet effective structure-aware dispatcher that considers both the dynamic graph's structural properties and the LLM's cognitive load to intelligently dispatch queries between the standard LLM prompting and the more powerful agent. Our experiments demonstrate that the structure-aware dispatcher effectively maintains high accuracy while reducing cost.

Bing Hao, Minglai Shao, Zengyi Wo, Yunlong Chu, Yuhang Liu, Ruijie Wang• 2025

Related benchmarks

Task	Dataset	Result
Sort Edge	fundamental dynamic graph tasks Level 0	--	20
Motif Classification	LLMTM 1.0 (test)	--	12
Reverse Graph	fundamental dynamic graph tasks Level 0	--	10
When Link and Dislink	fundamental dynamic graph tasks Level 0	--	10
Motif Construction	Motif Construction various temporal motifs	--	9
Motif Detection	Motif Detection	--	9
Motif Occurrence Prediction	LLMTM Level 2 1.0 (test)	--	9
Multi-Motif Counting	LLMTM Level 2 1.0 (test)	--	9
Multi-Motif Detection	LLMTM Level 2 1.0 (test)	--	9
Single-temporal motif recognition	LLMTM standard (test)	3-star Acc100	6

Showing 10 of 21 rows

Other info

Follow for update

@wizwand_team Discord