Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-media Structured Common Space for Multimedia Event Extraction

About

We introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. We develop the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments. We propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. The structures are aligned across modalities by employing a weakly supervised training strategy, which enables exploiting available resources without explicit cross-media annotation. Compared to uni-modal state-of-the-art methods, our approach achieves 4.0% and 9.8% absolute F-score gains on text event argument role labeling and visual event extraction. Compared to state-of-the-art multimedia unstructured representations, we achieve 8.3% and 5.0% absolute F-score gains on multimedia event extraction and argument role labeling, respectively. By utilizing images, we extract 21.4% more event mentions than traditional text-only methods.

Manling Li, Alireza Zareian, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang• 2020

Related benchmarks

TaskDatasetResultRank
Argument Role ExtractionM2E2 multimedia
F1 Score19.9
15
Event Mention IdentificationM2E2 multimedia
F1 Score50.8
15
Argument Role ExtractionM2E2 image-only
Precision14.5
14
Event Mention IdentificationM2E2 image-only
Precision (%)43.1
14
Argument Role ExtractionM2E2 text-only
Precision27.5
13
Event Mention IdentificationM2E2 text-only
Precision42.8
13
Showing 6 of 6 rows

Other info

Follow for update