Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LogicEnvGen: Task-Logic Driven Generation of Diverse Simulated Environments for Embodied AI

About

Simulated environments play an essential role in embodied AI, functionally analogous to test cases in software engineering. However, existing environment generation methods often emphasize visual realism (e.g., object diversity and layout coherence), overlooking a crucial aspect: logical diversity from the testing perspective. This limits the comprehensive evaluation of agent adaptability and planning robustness in distinct simulated environments. To bridge this gap, we propose LogicEnvGen, a novel method driven by Large Language Models (LLMs) that adopts a top-down paradigm to generate logically diverse simulated environments as test cases for agents. Given an agent task, LogicEnvGen first analyzes its execution logic to construct decision-tree-structured behavior plans and then synthesizes a set of logical trajectories. Subsequently, it adopts a heuristic algorithm to refine the trajectory set, reducing redundant simulation. For each logical trajectory, which represents a potential task situation, LogicEnvGen correspondingly instantiates a concrete environment. Notably, it employs constraint solving for physical plausibility. Furthermore, we introduce LogicEnvEval, a novel benchmark comprising four quantitative metrics for environment evaluation. Experimental results verify the lack of logical diversity in baselines and demonstrate that LogicEnvGen achieves 1.04-2.61x greater diversity, significantly improving the performance in revealing agent faults by 4.00%-68.00%.

Jianan Wang, Siyang Zhang, Bin Li, Juan Chen, Jingtao Qi, Zhuo Zhang, Chen Qian• 2026

Related benchmarks

TaskDatasetResultRank
Environment GenerationLogicEnvEval
Physics Pass Rate (Floor Plan)100
12
3D Indoor Scene SynthesisHuman Evaluation Study Generated 3D Scenes
Overall Score2.046
4
Indoor Scene Layout Generation3D Indoor Scenes
Functional Appropriateness2.88
4
Object PlacementLLaMA (seen)
Object Count41.4
4
Object PlacementQwen (unseen)
Object Count (CNT)15.68
4
Object PlacementMistral (unseen)
Object Count29.47
4
Showing 6 of 6 rows

Other info

Follow for update