Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Sentence completion on HellaSwag (test)
Loading...
72.35
Accuracy
Coherence Boosting (GPT-3 175B)
27.1828
38.9089
50.635
62.3611
Oct 15, 2021
May 22, 2022
Dec 28, 2022
Aug 4, 2023
Mar 11, 2024
Oct 16, 2024
May 24, 2025
Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
Accuracy
Coherence Boosting (GPT-3 175B)
alpha=-0.76
2021.10
72.35
GPT-3 175B
alpha=-1
2021.10
62.66
GPT-3 175B
alpha=0
2021.10
59.18
Coherence Boosting (GPT-2 XL)
alpha=-0.78
2021.10
47.66
GPT-2 XL (1.6B)
alpha=-1
2021.10
42.6
GPT-2 XL (1.6B)
alpha=0
2021.10
40
PGM 6 / 6 (1024)
distillation=Before
2025.05
34.27
PGM 8 / 8
distillation=Before
2025.05
33.2
PGM 6 / 6 (1024)
distillation=After Dis...
2025.05
32.55
Coherence Boosting (GPT-2 Small)
alpha=-0.9
2021.10
31.84
PGM 8 / 8
distillation=After Dis...
2025.05
31.62
MDLM
distillation=Before
2025.05
31.36
GPT-2 Small (125M)
alpha=-1
2021.10
30.99
MDLM
distillation=After Dis...
2025.05
30.75
GPT-2 Small (125M)
alpha=0
2021.10
28.92
Feedback
Search any
task
Search any
task