Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Evaluator Accuracy on AndroidWorld
Loading...
87.9
Overall Acc
StepCritic
82.18
83.665
85.15
86.635
Apr 27, 2025
Overall Acc
Sub-goal Acc (Completion)
Sub-goal Acc (Step)
Updated 4d ago
Evaluation Results
Method
Method
Links
Overall Acc
Sub-goal Acc (Completion)
Sub-goal Acc (Step)
StepCritic
Model=StepCritic
2025.04
87.9
92.8
82.3
Captioner + GPT-4
Model=Captioner + GPT-4
2025.04
84.6
-
-
Captioner + Mixtral
Model=Captioner + Mixtral
2025.04
82.4
-
-
Feedback
Search any
task
Search any
task