LogicEnvGen: Task-Logic Driven Generation of Diverse Simulated Environments for Embodied AI

About

Simulated environments play an essential role in embodied AI, functionally analogous to test cases in software engineering. However, existing environment generation methods often emphasize visual realism (e.g., object diversity and layout coherence), overlooking a crucial aspect: logical diversity from the testing perspective. This limits the comprehensive evaluation of agent adaptability and planning robustness in distinct simulated environments. To bridge this gap, we propose LogicEnvGen, a novel method driven by Large Language Models (LLMs) that adopts a top-down paradigm to generate logically diverse simulated environments as test cases for agents. Given an agent task, LogicEnvGen first analyzes its execution logic to construct decision-tree-structured behavior plans and then synthesizes a set of logical trajectories. Subsequently, it adopts a heuristic algorithm to refine the trajectory set, reducing redundant simulation. For each logical trajectory, which represents a potential task situation, LogicEnvGen correspondingly instantiates a concrete environment. Notably, it employs constraint solving for physical plausibility. Furthermore, we introduce LogicEnvEval, a novel benchmark comprising four quantitative metrics for environment evaluation. Experimental results verify the lack of logical diversity in baselines and demonstrate that LogicEnvGen achieves 1.04-2.61x greater diversity, significantly improving the performance in revealing agent faults by 4.00%-68.00%.

Jianan Wang, Siyang Zhang, Bin Li, Juan Chen, Jingtao Qi, Zhuo Zhang, Chen Qian• 2026

Related benchmarks

Task	Dataset	Result
Environment Generation	LogicEnvEval	Physics Pass Rate (Floor Plan)100	12
3D Indoor Scene Synthesis	Human Evaluation Study Generated 3D Scenes	Overall Score2.046	4
Indoor Scene Layout Generation	3D Indoor Scenes	Functional Appropriateness2.88	4
Object Placement	LLaMA (seen)	Object Count41.4	4
Object Placement	Qwen (unseen)	Object Count (CNT)15.68	4
Object Placement	Mistral (unseen)	Object Count29.47	4

Showing 6 of 6 rows

Other info

Follow for update

@wizwand_team Discord