Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WikiText-2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingWikiText-2
Perplexity (PPL)1.08
2,320
Open-ended text generationWikiText-2
COH Score0.811
112
PerplexityWikiText-2
Perplexity2.81
97
Language ModelingWikiText-2
Average Kurtosis93.039
72
Language ModelingWikiText-2 (val)
Perplexity (BVS)7.7
70
Language ModelingWikiText-2 raw (test)
Perplexity5.674
63
Text GenerationWikiText-2
Perplexity10.98
50
Language ModelingWikiText-2
Perplexity (PPL)5.82
40
Language ModelingWikiText-2
Mauve0.9
33
Language ModelingWikiText-2
WikiText-2 Score4.93
32
Feature Space PreservationWikiText-2
Cosine Similarity100
32
Language ModelingWikiText-2 context length 2048 (val)
WikiText-2 PPL9.92
24
Language ModelingWikiText-2
Perplexity9.6
22
Language ModelingWikiText-2 Llama-3.1-8B-Instruct (test)
Perplexity7.2
22
Language ModelingWikiText-2 v1 (val)
Perplexity42.41
20
Language ModelingWikiText-2
Perplexity (PPL)4.91
19
Language ModelingWikiText-2 10K-word evaluator standardized
PPL Delta (%)11.9
18
Language ModelingWikitext 2 Llama 2 & 3 (test)
PPL (Llama 2, Config 7)5.47
16
Language GenerationWikiText-2 (test)
Perplexity3.319
16
Text GenerationWikiText-2
ROUGE-139.1
15
Language ModelingWikiText-2 context length 4096 (test)
PPL (WikiText-2)5.11
15
Language ModelingWikiText-2 2017 (test)
PPL (Uniform)7.5
12
Language ModelingWikiText-2
Perplexity (Baseline)5.44
12
Language ModelingWikitext-2 Standard (val)
Perplexity40.2
12
Causal PredictionWikiText-2 (val)
Min Validation Loss5.4856
11
Showing 25 of 48 rows