Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ECHO: Event-Centric Hypergraph Operations via Multi-Agent Collaboration for Multimedia Event Extraction

About

Multimedia event extraction (M2E2) aims to predict triggers, ground arguments across text and images, and then assemble them into schema-consistent event records. Recent LLM-based approaches have shown strong potential for M2E2, but their intermediate event hypotheses often remain implicit, and event-argument linking is still tightly coupled with role binding. This leaves little opportunity to inspect or revise intermediate event hypotheses and makes predictions brittle to early errors. To bridge this gap, we present ECHO, a multi-agent framework that reframes M2E2 as iterative refinement over an explicit Multimedia Event Hypergraph (MEHG). Instead of relying on implicit linear generation, ECHO performs auditable atomic updates over a shared hypergraph, making intermediate event structures explicit and revisable. Furthermore, we introduce a Link-then-Bind strategy that decouples event-argument linking from role binding, reducing premature semantic commitment during structured prediction. Extensive experiments on the M2E2 benchmark show that ECHO consistently outperforms prior state-of-the-art approaches, achieving gains of 7.3 and 15.5 F1 points on event mention and argument role, respectively.

Hailong Chu, Hongbing Li, Yunlong Chu, Shutai Huang, Xingyue Zhang, Tinghe Yan, Jinsong Zhang, Shuo Zhang, Lei Li• 2026

Related benchmarks

TaskDatasetResultRank
Argument Role ExtractionM2E2 multimedia
F1 Score55
28
Event Mention IdentificationM2E2 text-only
Precision63.7
26
Argument Role ExtractionM2E2 text-only
Precision33.6
26
Event Mention ExtractionM2E2 Visual Events
F1 Score82.1
16
Argument Role ExtractionM2E2 Visual Events
Precision62.7
15
Event Mention ExtractionM2E2 (Multimedia Events)
Precision79.6
12
Showing 6 of 6 rows

Other info

Follow for update