SceneDiff: A Benchmark and Method for Multiview Object Change Detection
About
We investigate the problem of identifying objects that have been added, removed, or moved between a pair of captures (images or videos) of the same scene at different times. Accurately identifying verifiable changes is extremely challenging -- some objects may appear to be missing because they are occluded or out of frame, while others may appear different due to large viewpoint changes. To study this problem, we introduce the SceneDiff Benchmark, the first multiview change detection dataset for scenes captured along different camera trajectories, comprising 350 diverse video pairs with dense object instance-level annotations. We also introduce the SceneDiff algorithm, a training-free approach that solves for image poses, segments images into objects, and compares them using semantic and geometric features. By building on pretrained models, SceneDiff generalizes across domains without retraining and naturally improves as the underlying models advance. Experiments on multiview and two-view benchmarks demonstrate that our method outperforms existing approaches by large margins (53.0\% and 30.6\% relative AP improvements). Project page: https://yuqunw.github.io/SceneDiff
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Semantic Change Detection | ChangeSim | -- | 25 | |
| Multiview Change Detection | SceneDiff SD-V 1.0 (test) | AP (Per-View)49.6 | 4 | |
| Multiview Change Detection | SceneDiff SD-K 1.0 (test) | Per-View AP23.6 | 4 | |
| Two-View Change Detection | RC-3D | AP50 (Both)68.7 | 3 |