SoccerMaster: A Vision Foundation Model for Soccer Understanding
About
Soccer understanding has recently garnered growing research interest due to its domain-specific complexity and unique challenges. Unlike prior works that typically rely on isolated, task-specific expert models, this work aims to propose a unified model to handle diverse soccer visual understanding tasks, ranging from fine-grained perception (e.g., athlete detection) to semantic reasoning (e.g., event classification). Specifically, our contributions are threefold: (i) we present SoccerMaster, the first soccer-specific vision foundation model that unifies diverse understanding tasks within a single framework via supervised multi-task pretraining; (ii) we develop an automated data curation pipeline to generate scalable spatial annotations, and integrate them with various existing soccer video datasets to construct SoccerFactory, a comprehensive pretraining data resource; and (iii) we conduct extensive evaluations demonstrating that SoccerMaster consistently outperforms task-specific expert models across diverse downstream tasks, highlighting its breadth and superiority. The data, code, and model will be publicly available.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Object Tracking | SoccerNet (test) | HOTA59.1 | 23 | |
| Athlete Detection | Soccer Pretraining Dataset | AP@5092.3 | 6 | |
| Keypoints Detection | Soccer Pretraining Dataset | Accuracy95.2 | 6 | |
| Lines Detection | Soccer Pretraining Dataset | Accuracy95.9 | 6 | |
| Event Classification | Soccer Pretraining Dataset | Accuracy0.738 | 4 | |
| Game State Reconstruction | SoccerNet-GSR (test) | GS-HOTA64.1 | 4 | |
| Camera Calibration | SoccerNet 2022 (test-center) | Junction Accuracy (5px tolerance)76.9 | 4 | |
| Camera Calibration | SoccerNet 2023 (test) | JaC (5px)71.1 | 4 | |
| Commentary Generation | SN-Caption (test-align) | BLEU@131.3 | 3 | |
| Video-Commentary Alignment | Soccer Pretraining Dataset | Top-1 Accuracy35 | 3 |