Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Safe Planning for Embodied Agents on EMBGUARD (test)
Loading...
95.1
Safe Precision
Gemini-2.5-Pro
90.004
91.327
92.65
93.973
May 29, 2026
Safe Precision
Safe Recall
Hazard Accuracy
Risk Type Accuracy
Updated 2d ago
Evaluation Results
Method
Method
Links
Safe Precision
Safe Recall
Hazard Accuracy
Risk Type Accuracy
Gemini-2.5-Pro
2026.05
95.1
42.8
33.4
54.9
Qwen-3-VL-32B
Model Scale=32B
2026.05
92.4
66
28.7
50
EMBGUARD-4B
Model Scale=4B
2026.05
92.1
61.5
23.4
40.4
GPT-5.1
2026.05
91.4
71
35.6
34.5
EMBGUARD-2B
Model Scale=2B
2026.05
90.2
40
6.6
40.8
Feedback
Search any
task
Search any
task