Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Language Model Evaluation on 1.3B LLM Leaderboard
Loading...
36.4
ARC
QA+C4-85B
31.512
32.781
34.05
35.319
Jan 29, 2024
ARC
HellaSwag
MMLU
TruthfulQA
WinoGrande
GSM8K
Average Score
Updated 4d ago
Evaluation Results
Method
Method
Links
ARC
HellaSwag
MMLU
TruthfulQA
WinoGrande
GSM8K
Average Score
QA+C4-85B
Training Steps=300K, S...
2024.01
36.4
60.9
25.5
40.6
59.4
40
37.2
QA+C4-100B
Training Steps=350K, S...
2024.01
35.5
60.5
26.8
40.6
61.3
30
37.5
Falcon-RW
Seed=1234
2024.01
35.1
63.6
25.3
36
62
50
37.1
Pythia-1.4b-Pile
Seed=1234
2024.01
32.7
55
25.6
38.7
57.3
80
35
C4
Seed=1234
2024.01
31.7
62.1
26.7
33.4
57.9
90
35.5
Feedback
Search any
task
Search any
task