Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ReaMOT: A Benchmark and Framework for Reasoning-based Multi-Object Tracking

About

Referring Multi-Object Tracking (RMOT) aims to track targets specified by language instructions. However, existing RMOT paradigms heavily rely on explicit visual-textual matching and consequently fail to generalize to complex instructions that require logical reasoning. To overcome this, we propose Reasoning-based Multi-Object Tracking (ReaMOT), a novel task that elevates tracking to a cognitive level, requiring models to infer and track specific targets satisfying implicit constraints via logical reasoning. To advance this field, we construct the ReaMOT Challenge, a comprehensive benchmark featuring a tailored metric suite and a large scale dataset. This dataset comprises 1,156 language instructions, 423,359 image language pairs, and 869 distinct video sequences systematically categorized into six distinct evaluation scenarios, with over 75\% of the instructions dedicated to High Level Reasoning. Furthermore, recognizing that traditional trackers lack cognitive capacity while direct application of Large Vision-Language Model (LVLM) yields severe temporal inconsistencies, we propose ReaTrack. Driven by the insight to decouple high-level cognitive localization from low-level physical motion continuity, this training-free framework dynamically aligns the semantic detections of a Thinking-variant LVLM with the robust motion priors of SAM2. Extensive experiments on the ReaMOT Challenge benchmark demonstrate that ReaTrack establishes a new leading performance standard. Notably, it achieves a more than threefold improvement in RHOTA on the High Level Reasoning subset. Our dataset and code will be available at https://github.com/chen-si-jia/ReaMOT.

Sijia Chen, Yanqiu Yu, En Yu, Wenbing Tao• 2025

Related benchmarks

TaskDatasetResultRank
Referring Multi-Object TrackingReaMOT Challenge High-Level Reasoning
RHOTA42.28
7
Referring Multi-Object TrackingReaMOT Challenge Low-Level Perception
RHOTA31.55
7
Showing 2 of 2 rows

Other info

Follow for update