Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Towards Universal Soccer Video Understanding

About

As a globally celebrated sport, soccer has attracted widespread interest from fans all over the world. This paper aims to develop a comprehensive multi-modal framework for soccer video understanding. Specifically, we make the following contributions in this paper: (i) we introduce SoccerReplay-1988, the largest multi-modal soccer dataset to date, featuring videos and detailed annotations from 1,988 complete matches, with an automated annotation pipeline; (ii) we present an advanced soccer-specific visual encoder, MatchVision, which leverages spatiotemporal information across soccer videos and excels in various downstream tasks; (iii) we conduct extensive experiments and ablation studies on event classification, commentary generation, and multi-view foul recognition. MatchVision demonstrates state-of-the-art performance on all of them, substantially outperforming existing models, which highlights the superiority of our proposed data and model. We believe that this work will offer a standard paradigm for sports understanding research.

Jiayuan Rao, Haoning Wu, Hao Jiang, Ya Zhang, Yanfeng Wang, Weidi Xie• 2024

Related benchmarks

TaskDatasetResultRank
Lines DetectionSoccer Pretraining Dataset
Accuracy90.3
6
Athlete DetectionSoccer Pretraining Dataset
AP@5051.9
6
Keypoints DetectionSoccer Pretraining Dataset
Accuracy92
6
Event ClassificationSoccer Pretraining Dataset
Accuracy0.653
4
Commentary GenerationSN-Caption (test-align)
BLEU@130.9
3
Video-Commentary AlignmentSoccer Pretraining Dataset
Top-1 Accuracy4
3
Showing 6 of 6 rows

Other info

Follow for update