Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Reasoning on MBPP base and extended (out-of-distribution)
Loading...
55.83
Accuracy
InftyThink+
47.7596
49.8548
51.95
54.0452
Feb 6, 2026
Accuracy
Accuracy+
Tokens Used
Latency
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Accuracy+
Tokens Used
Latency
InftyThink+
RL_Setting=Task reward...
2026.02
55.83
47.38
9.27
49.19
InftyThink+
RL_Setting=Task and ef...
2026.02
53.85
45.92
4.64
23.62
Vanilla
RL_Setting=Task reward...
2026.02
50.09
45.43
6.69
80.36
InftyThink+
RL_Setting=Cold start...
2026.02
49.28
42.05
4.73
26.36
Vanilla
RL_Setting=Cold start...
2026.02
48.07
41.01
5.48
62.45
Feedback
Search any
task
Search any
task