SoccerMaster: A Vision Foundation Model for Soccer Understanding

About

Soccer understanding has recently garnered growing research interest due to its domain-specific complexity and unique challenges. Unlike prior works that typically rely on isolated, task-specific expert models, this work aims to propose a unified model to handle diverse soccer visual understanding tasks, ranging from fine-grained perception (e.g., athlete detection and identification) to high-level semantic reasoning (e.g., event classification). Concretely, our contributions are threefold: (i) we present SoccerMaster, the first soccer-specific vision foundation model that unifies diverse tasks within a single framework via supervised multi-task pretraining; (ii) we develop an automated data curation pipeline, SoccerFactory, to generate scalable spatial annotations, and integrate multiple existing soccer video datasets as a comprehensive pretraining data resource for multi-task pretraining; and (iii) we conduct extensive evaluations demonstrating that SoccerMaster consistently outperforms task-specific expert models across diverse downstream tasks, highlighting its breadth and superiority. The data, code, and model will be publicly available.

Haolin Yang, Jiayuan Rao, Haoning Wu, Weidi Xie• 2025

Related benchmarks

Task	Dataset	Result
Multi-Object Tracking	SoccerNet (test)	HOTA59.1	27
Camera Calibration	SoccerNet 2022 (test-center)	Junction Accuracy (5px tolerance)76.9	10
Game State Reconstruction	SoccerNet-GSR (test)	GS-HOTA64.1	9
Camera Calibration	SoccerNet 2023 (test)	JaC (5px)71.1	7
Athlete Detection	Soccer Pretraining Dataset	AP@5092.3	6
Keypoints Detection	Soccer Pretraining Dataset	Accuracy95.2	6
Lines Detection	Soccer Pretraining Dataset	Accuracy95.9	6
Event Classification	Soccer Pretraining Dataset	Accuracy0.738	4
Commentary Generation	SN-Caption (test-align)	BLEU@131.3	3
Video-Commentary Alignment	Soccer Pretraining Dataset	Top-1 Accuracy35	3

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord