Share your thoughts, 1 month free Claude Pro on usSee more

Language Modeling on 100 Billion Word Google News Dataset (test)

38.2Test Perplexity (0.1 epochs)

MoE-16384-h

Updated 3mo ago

Evaluation Results

Method	Links
MoE-16384-h 2017.01		38.2	29.7
MoE-65536-h 2017.01		38.2	28.9
MoE-4096-h 2017.01		38.9	30.9
MoE-131072-h 2017.01		39.8	29.2
MoE-1024-h 2017.01		40.3	32.7
MoE-256-h 2017.01		42.8	35.3
MoE-32 2017.01		48.5	40.4
4xLSTM-512 2017.01		54.5	47
Kneser-Ney 5-gram 2017.01		67.1	45.3