EZBlender: Efficient 3D Editing with Plan-and-ReAct Agent
About
As a cornerstone of the modern digital economy, 3D modeling and rendering demand substantial resources and manual effort when scene editing is performed in the traditional manner. Despite recent progress in VLM-based agents for 3D editing, the fundamental trade-off between editing precision and agent responsiveness remains unresolved. To overcome these limitations, we present EZBlender, a Blender agent with a hybrid framework that combines planning-based task decomposition and reactive local autonomy for efficient human AI collaboration and semantically faithful 3D editing. Specifically, this unexplored Plan-and-ReAct design not only preserves editing quality but also significantly reduces latency and computational cost. To further validate the efficiency and effectiveness of the proposed edge-autonomy architecture, we construct a dedicated multi-tasking benchmark that has not been systematically investigated in prior research. In addition, we provide a comprehensive analysis of language model preference, system responsiveness, and economic efficiency.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Scene Editing | 3D Scene Editing | Prompt Tokens4.62e+3 | 5 | |
| 3D Scene Editing | 3D Editing Benchmark Scenario S1 | TCR (%)78.67 | 3 | |
| 3D Scene Editing | 3D Editing Benchmark (Scenario S2) | TCR84.67 | 3 | |
| 3D Scene Editing | 3D Editing Benchmark Scenario S3 | TCR60.66 | 3 | |
| 3D Scene Editing | 3D Editing Benchmark Scenario S4 | TCR61.33 | 3 | |
| 3D Scene Editing | 3D Editing Benchmark Scenario S5 | TCR58.66 | 3 | |
| Text-Prompt Editing | BlenderGym (test) | Shapekey CLIP Score30.21 | 3 | |
| Visual-Prompt Editing | BlenderGym VLM (test) | Blend Shape CLIP Sim0.9816 | 3 | |
| 3D Scene Editing | 15 distinct single-task prompts | LLM Time20.58 | 3 |