CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning

About

Understanding camera dynamics is a fundamental pillar of video spatial intelligence. However, existing multimodal models predominantly treat this task as a black-box classification, often confusing physically distinct motions by relying on superficial visual patterns rather than geometric cues. We present \textbf{CamReasoner}, a framework that reformulates camera movement understanding as a structured inference process to bridge the gap between perception and cinematic logic. Our approach centers on the Observation-Thinking-Answer (O-T-A) paradigm, which compels the model to articulate spatio-temporal observations and reason about motion patterns within an explicit reasoning block. To instill this capability, we construct a Large-scale Inference Trajectory Suite comprising 18k SFT reasoning chains and 38k RL feedback samples. To the best of our knowledge, \textbf{we are the first to employ RL for logical alignment in camera movement understanding}, ensuring motion inferences are grounded in structured visual reasoning rather than contextual guesswork. Built upon Qwen2.5-VL-7B, CamReasoner-7B improves binary classification accuracy from 73.8\% to 78.4\% and VQA accuracy from 60.9\% to 74.5\% over its backbone, consistently outperforming both proprietary and open-source baselines across multiple benchmarks.

Hang Wu, Yujun Cai, Zehao Li, Haonan Ge, Bowen Sun, Junsong Yuan, Yiwei Wang• 2026

Related benchmarks

Task	Dataset	Result
Camera movement understanding	CameraBench 10K-sample VQA subset 1.0 (test)	Translation (In) Error68.7	24
Binary Question Answering	ACaM Synthetic Videos Binary QA (test)	Accuracy (Static)74.07	23
Camera movement understanding	ACaM synthetic videos (test)	Static Accuracy84.92	23
Video Multiple Choice Question Answering	ACaM real-world videos 1.0 (test)	Accuracy (Static)76.5	23
Visual Question Answering	CameraBench	Motion Steadiness Accuracy0.78	21

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord