Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval (Accuracy)
Loading...
54.27
Accuracy
Llama3.1-8B-Instruct
38.41
42.5275
46.645
50.7625
Jan 30, 2026
Accuracy
Updated 25d ago
Evaluation Results
Method
Method
Links
Accuracy
Llama3.1-8B-Instruct
Type=Autoregressive
2026.01
54.27
Qwen2.5-7B-Instruct
Type=Autoregressive
2026.01
52.44
LLaDA1.5-8B
Decoding Strategy=Four...
2026.01
43.29
LLaDA1.5-8B
Decoding Strategy=Vanilla
2026.01
40.85
LLaDA-8B-Instruct
Decoding Strategy=Four...
2026.01
40.85
LLaDA1.5-8B
Decoding Strategy=RWS
2026.01
39.63
LLaDA-8B-Instruct
Decoding Strategy=Vanilla
2026.01
39.63
LLaDA-8B-Instruct
Decoding Strategy=PC-S...
2026.01
39.63
LLaDA1.5-8B
Decoding Strategy=PC-S...
2026.01
39.02
LLaDA-8B-Instruct
Decoding Strategy=RWS
2026.01
39.02
Feedback
Search any
task
Search any
task