Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code Reasoning on HumanEval base and extended (out-of-distribution)
Loading...
0.677
Accuracy
InftyThink+
0.566032
0.594841
0.62365
0.652459
Feb 6, 2026
Accuracy
Accuracy+
Avg Token Count
Latency
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Accuracy+
Avg Token Count
Latency
InftyThink+
RL_Setting=Task reward...
2026.02
0.677
0.626
8.21
42.22
InftyThink+
RL_Setting=Task and ef...
2026.02
0.6742
0.6191
4.66
23.9
Vanilla
RL_Setting=Task reward...
2026.02
0.6044
0.5627
8.17
90.1
InftyThink+
RL_Setting=Cold start...
2026.02
0.5903
0.5434
5.02
27.5
Vanilla
RL_Setting=Cold start...
2026.02
0.5703
0.5252
6.58
65.89
Feedback
Search any
task
Search any
task