Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Zero-shot Language Modeling on WikiText-103 and LAMBADA
Loading...
19.49
WikiText-103 Perplexity
HAM Learned router + EDA
19.3532
20.2766
21.2
22.1234
Mar 20, 2026
WikiText-103 Perplexity
LAMBADA Perplexity
Updated 24d ago
Evaluation Results
Method
Method
Links
WikiText-103 Perplexity
LAMBADA Perplexity
HAM Learned router + EDA
Scale=800M, KV Cache U...
2026.03
19.49
14.9
Transformer
Scale=800M, KV Cache U...
2026.03
19.66
13.44
HAM Fixed τ
Scale=800M, KV Cache U...
2026.03
19.85
14.16
HAM Learned τ
Scale=800M, KV Cache U...
2026.03
19.94
15.67
HAM Learned router
Scale=800M, KV Cache U...
2026.03
20.15
13.96
GDN-GSA
Scale=800M
2026.03
22.04
14.39
GDN
Scale=800M
2026.03
22.91
15.41
Feedback
Search any
task
Search any
task