Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Robot Failure Detection on BridgeData Fail V2
Loading...
85
Execution Accuracy
Guardian-8B-Thinking
51.72
60.36
69
77.64
Dec 1, 2025
Execution Accuracy
Planning Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Execution Accuracy
Planning Accuracy
Guardian-8B-Thinking
Model Category=Special...
2025.12
85
91
Qwen3-VL-235B-A22B
Model Category=Large-s...
2025.12
75
86
GPT4.1
Model Category=Large-s...
2025.12
72
86
InternVL3-8B
Model Category=Special...
2025.12
66
71
CLIP+MLP
Model Category=Special...
2025.12
58
54
Sentinel
Model Category=Special...
2025.12
57
-
Cosmos-Reason1-7B
Model Category=Special...
2025.12
53
61
Feedback
Search any
task
Search any
task