Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Code-writing on HumanEval & MBPP EvalPlus (test)
Loading...
39.02
HumanEval Pass Rate
CRITIQ
28.2456
31.0428
33.84
36.6372
Feb 26, 2025
HumanEval Pass Rate
HumanEval+ Pass Rate
MBPP Pass Rate
MBPP+ Pass Rate
Avg. Pass Rate
Avg.+ Pass Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
HumanEval Pass Rate
HumanEval+ Pass Rate
MBPP Pass Rate
MBPP+ Pass Rate
Avg. Pass Rate
Avg.+ Pass Rate
CRITIQ
Backbone=Llama-3.2-3B,...
2025.02
39.02
33.54
68.73
48.41
53.88
40.98
QR-Edu
Backbone=Llama-3.2-3B,...
2025.02
36.59
32.32
59.26
47.35
47.93
39.84
Stack
Backbone=Llama-3.2-3B,...
2025.02
31.71
27.44
56.61
46.3
44.16
36.87
Raw
Backbone=Llama-3.2-3B,...
2025.02
28.66
25.61
48.94
39.15
38.8
32.38
Feedback
Search any
task
Search any
task