Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

DeepSport: A Multimodal Large Language Model for Comprehensive Sports Video Reasoning via Agentic Reinforcement Learning

About

Sports video understanding requires perceiving high-speed dynamics, complex rules, and long temporal contexts. Yet, current Multimodal Large Language Models (MLLMs) remain narrowly focused on single sports, specific tasks, or training-free paradigms. We introduce DeepSport, the first end-to-end trained MLLM for multi-task, multi-sport video understanding. DeepSport shifts from passive frame processing to active, iterative reasoning, dynamically extracting frames to "think with videos." To train our model, we curate a unified 78k-sample dataset via a rigorous three-step text-and-vision distillation pipeline. We then employ a progressive two-stage training strategy: a Sports Curriculum Supervised Fine-Tuning phase to build foundational perception, followed by Agentic Reinforcement Learning with a novel tool-use reward. Extensive experiments on a comprehensive 6.7k benchmark demonstrate that DeepSport achieves state-of-the-art performance, outperforming powerful proprietary and open-source models, while utilizing significantly fewer frames. Furthermore, it exhibits strong zero-shot transferability to unseen sports and broad motion recognition tasks, establishing a highly efficient and generalized foundation for complex video reasoning.

Junbo Zou, Haotian Xia, Zhen Ye, Shengjie Zhang, Christopher Lai, Vicente Ordonez, Weining Shen, Hanjie Chen• 2025

Related benchmarks

TaskDatasetResultRank
Video UnderstandingLVBench--
67
Sports Video UnderstandingDeepSport (test)
Fine-Grained Recognition Accuracy49.89
13
Motion UnderstandingMotionBench
Accuracy48.5
12
General Video UnderstandingLongVideoBench
Accuracy45.9
4
Action & Motion RecognitionDREAM 1k
F1 Score30.5
2
Action & Motion RecognitionActionAtlas (unseen sports)
Accuracy27.2
2
General Video UnderstandingVideoMME Long
Accuracy40.4
2
Showing 7 of 7 rows

Other info

Follow for update