Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Code Generation on HumanEval (Accuracy, General Capability Average Accuracy)
Loading...
35.4
Accuracy
Pre-trained
28.64
30.395
32.15
33.905
Jun 1, 2026
Accuracy
General Capability Average Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
General Capability Average Accuracy
Pre-trained
Model=Gemma-3-4B
2026.06
35.4
50.95
AlphaToken
Model=Gemma-3-4B
2026.06
33.27
49.51
STM
Model=Gemma-3-4B
2026.06
31.66
48.32
ssTOKEN
Model=Gemma-3-4B
2026.06
31.16
48.05
XTF
Model=Gemma-3-4B
2026.06
30.8
47.3
Token Cleaning
Model=Gemma-3-4B
2026.06
30.58
47.94
LESS
Model=Gemma-3-4B
2026.06
30.42
47.92
LoRA
Model=Gemma-3-4B
2026.06
29.93
47.42
Standard FT
Model=Gemma-3-4B
2026.06
28.9
44.2
Feedback
Search any
task
Search any
task