Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval+ (SR, LLM%)
Loading...
98.8
SR
LLM-only
91.624
93.487
95.35
97.213
May 15, 2026
SR
LLM%
Updated 15d ago
Evaluation Results
Method
Method
Links
SR
LLM%
LLM-only
Backbones=Averaged ove...
2026.05
98.8
100
Oracle Router
Backbones=Averaged ove...
2026.05
98.6
8.1
Heuristic Router
Backbones=Averaged ove...
2026.05
97.4
26.6
R2V
Backbones=Averaged ove...
2026.05
94.3
0.6
Entropy Router
Backbones=Averaged ove...
2026.05
92.9
0.8
SLM-only
Backbones=Averaged ove...
2026.05
91.9
0
Feedback
Search any
task
Search any
task