Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Interaction-Consistent Object Removal via MLLM-Based Reasoning

About

Image-based object removal often erases only the named target, leaving behind interaction evidence that renders the result semantically inconsistent. We formalize this problem as Interaction-Consistent Object Removal (ICOR), which requires removing not only the target object but also associated interaction elements, such as lighting-dependent effects, physically connected objects, targetproduced elements, and contextually linked objects. To address this task, we propose Reasoning-Enhanced Object Removal with MLLM (REORM), a reasoningenhanced object removal framework that leverages multimodal large language models to infer which elements must be jointly removed. REORM features a modular design that integrates MLLM-driven analysis, mask-guided removal, and a self-correction mechanism, along with a local-deployment variant that supports accurate editing under limited resources. To support evaluation, we introduce ICOREval, a benchmark consisting of instruction-driven removals with rich interaction dependencies. On ICOREval, REORM outperforms state-of-the-art image editing systems, demonstrating its effectiveness in producing interactionconsistent results.

Ching-Kai Huang, Wen-Chieh Lin, Yan-Cen Lee• 2026

Related benchmarks

TaskDatasetResultRank
Object RemovalICOREval
DINO Score93.7
6
Showing 1 of 1 rows

Other info

Follow for update