Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Language Modeling on Broad evaluation suite unseen S1 (dev)
Loading...
74.2
Average Accuracy
all-FA
56.52
61.11
65.7
70.29
Apr 21, 2026
Average Accuracy
Retention
Speedup @32k
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Accuracy
Retention
Speedup @32k
all-FA
FA=48, SWA=0, KDA=0, G...
2026.04
74.2
100
1
Idealized|All–18
FA=13, SWA=32, KDA=1,...
2026.04
71.8
97
2
Reg|Lklhd–26
FA=12, SWA=26, KDA=6,...
2026.04
71.1
96
2.9
Reg|Lklhd–18
FA=3, SWA=25, KDA=4, G...
2026.04
69.7
94
4.8
Idealized|Lklhd–6
FA=0, SWA=30, KDA=5, G...
2026.04
66.8
90
6.2
Idealized|All–6
FA=0, SWA=30, KDA=5, G...
2026.04
65.3
88
6.1
Reg|Lklhd–13
FA=0, SWA=16, KDA=13,...
2026.04
60.2
81
6.9
Reg|Lklhd–10
FA=0, SWA=10, KDA=5, G...
2026.04
57.2
77
10.7
Feedback
Search any
task
Search any
task