Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Zero-shot Downstream Evaluation on LM Evaluation Harness v1.0.0
Loading...
50.1
HellaSwag Accuracy
Baseline FT
47.188
47.944
48.7
49.456
May 13, 2026
HellaSwag Accuracy
ARC-E Accuracy
ARC-C Accuracy
PIQA Accuracy
WinoGrande Accuracy
BoolQ Accuracy
MMLU Accuracy
LAMB Accuracy
Average Zero-Shot Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
HellaSwag Accuracy
ARC-E Accuracy
ARC-C Accuracy
PIQA Accuracy
WinoGrande Accuracy
BoolQ Accuracy
MMLU Accuracy
LAMB Accuracy
Average Zero-Shot Accuracy
Baseline FT
Backbone=Qwen3-0.6B, E...
2026.05
50.1
61.5
36.7
69.2
57
70.3
45.2
50
55
Delta Block
Backbone=Qwen3-0.6B, E...
2026.05
50
66.5
37.3
69.4
56.7
70.3
44.8
49.8
55.6
AttnRes
Backbone=Qwen3-0.6B, E...
2026.05
49.4
60.2
35.3
68.8
57.9
70.3
42.5
48.4
54.1
Pretrained
Backbone=Qwen3-0.6B, E...
2026.05
47.3
56.2
33.9
67.4
55.9
63.8
40.3
40
50.6
Feedback
Search any
task
Search any
task