Character-level Language Modeling

Benchmarks

Dataset Name	SOTA Method	Metric
enwik8 (test)	Transformer-XL + RMS dynamic eval + decay	BPC0.94	195	4mo ago
text8 (test)	Transformer-XL + RMS dynamic eval + decay	BPC1.038	128	4mo ago
Penn Treebank (test)	3 layer LSTM	BPC1.175	113	4mo ago
Shakespeare modern	SGHMC	Accuracy55.63	48	4mo ago
Hutter Prize Wikipedia (test)	mLSTM	Bits/Char1.08	28	4mo ago
Shakespeare (val)	Byz-NSGDM	Perplexity10.08	27	3mo ago
Penn Treebank char-level (test)	dense-IndRNN	BPC1.16	25	4mo ago
Enwik8 (val)	Adaptive Transformer	BPC1.04	23	1mo ago
Tiny Shakespeare (val)	EGA-MORLET	Validation Loss1.355	19	1mo ago
text8		BPC0.98	16	4mo ago
text8 (held-out 1M tokens)	SHARP	BPC2.3	14	1mo ago
text8 (dev)	Transformer + adaptive span	BPC1.01	13	4mo ago
Shakespeare (train)	Adam	Accuracy59.8	12	2mo ago
Shakespeare (test)	Adam	Accuracy50.2	12	2mo ago
enwik8 (train)	RWKV-RNN	BPC0.72	12	4mo ago
enwik8 (dev)	Adaptive	BPC1	10	4mo ago
Penn Treebank character-level (val)	LayerNorm HM-LSTM	BPC1.24	10	2mo ago
enwik8 one-million-parameter scale (val)	FSN, converged	BPC1.5953	7	1mo ago
text8 (most recent 1M tokens)	SHARP	BPC2.23	7	1mo ago
text8 100M regime Backward stream	Transformer (ctx=1024)	Backward BPC2.17	7	1mo ago
text8 100M regime (Current stream split)	Transformer (ctx=1024)	Current BPC2.12	7	1mo ago
text8 100M regime (Forward split)	Transformer (ctx=1024)	Forward BPC2.19	7	1mo ago
Shakespeare	MINGRU + αCMRU	Cross-entropy Loss1.441	6	2mo ago
Billion Words (test)	DGflow	JS Divergence (Context 4)0.186	4	4mo ago
codeparrot sequence length 256 (val)	FSN, no phases	Bits per Character (BPC)1.2217	3	1mo ago

Showing 25 of 31 rows