| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Embodied Agent Planning (Adversarial Safety Evaluation) | SafeAgentBench Unsafe Tasks - Jailbreak | SR54.85 | 12 | |
| Embodied Agent Planning (Safety Evaluation) | SafeAgentBench Unsafe Tasks | Success Rate59.87 | 12 | |
| Embodied Agent Planning | SafeAgentBench Safe Tasks | Success Rate75.25 | 12 | |
| Risk Identification | SafeAgentBench | RIR80.77 | 12 | |
| Safe Agent Planning | SafeAgentBench 1.0 (test) | Harm Rate (HAR)0 | 10 | |
| Safe Agent Evaluation | SafeAgentBench Kitchen N=210 | HAR (%)0 | 10 |