Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Task Completion Classification on SARM (real-world rollouts)
Loading...
92.8
Average Accuracy
GRM-8B
31.544
47.447
63.35
79.253
Dec 29, 2025
Average Accuracy
Stacking Success Rate
Folding Success Rate
Clearing Success Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Accuracy
Stacking Success Rate
Folding Success Rate
Clearing Success Rate
GRM-8B
Model Category=RMs, Vi...
2025.12
92.8
-
-
-
GPT-5
Model Category=VLMs, V...
2025.12
83.9
-
-
-
GRM-8B
Model Category=RMs, Vi...
2025.12
83.9
-
-
-
Gemini-2.5-Pro
Model Category=VLMs, V...
2025.12
81.1
-
-
-
Qwen3-VL
Model Category=VLMs, V...
2025.12
76.7
-
-
-
RoboBrain 2.0
Model Category=VLMs, V...
2025.12
61.7
-
-
-
GVL
Model Category=RMs, Vi...
2025.12
37.2
-
-
-
VLAC-2B
Model Category=RMs, Vi...
2025.12
33.9
-
-
-
Feedback
Search any
task
Search any
task