Share your thoughts, 1 month free Claude Pro on usSee more

Autoregressive Language Modeling on WikiText-103 (first 10M tokens)

90.5Perplexity (PPL)

TF-GPT

Updated 1mo ago

Evaluation Results

Method	Links
TF-GPT 2026.04		90.5	-
MIPT+Cache 2026.04		92.1	1.8
MIPT+Cache 2026.04		96.3	6.4
MIPT+Cache 2026.04		98.1	8.4
MIPT-LM 2026.04		102.2	12.9