Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WikiText-2

Benchmarks

Task NameDataset NameSOTA ResultTrend
Language ModelingWikiText-2
Perplexity (PPL)1.61
1,624
Open-ended text generationWikiText-2
COH Score0.811
112
Language ModelingWikiText-2 raw (test)
Perplexity5.674
57
Feature Space PreservationWikiText-2
Cosine Similarity100
32
Language ModelingWikiText-2 context length 2048 (val)
WikiText-2 PPL9.92
24
Language ModelingWikiText-2 Llama-3.1-8B-Instruct (test)
Perplexity7.2
22
Language ModelingWikiText-2 v1 (val)
Perplexity42.41
20
Language ModelingWikitext 2 Llama 2 & 3 (test)
PPL (Llama 2, Config 7)5.47
16
Language ModelingWikiText-2 context length 4096 (test)
PPL (WikiText-2)5.11
15
Language ModelingWikitext-2 Standard (val)
Perplexity40.2
12
Causal PredictionWikiText-2 (val)
Min Validation Loss5.4856
11
PerplexityWikiText-2
Perplexity5.5
10
White-box robustness against single point failure attacksWikiText-2
Original Perplexity (PPL)5.03
8
Language ModelingWikiText-2
Perplexity (Baseline)11.02
8
Language GenerationWikiText-2 (test)
Perplexity3.319
8
Language ModelingWikiText-2 context length 2048 (test)
Perplexity7.15
7
Language ModelingWikitext-2 word-level (dev)
PPL66.9
7
Language ModelingWikiText-2 TinyLlama
ΔPPL0.0036
6
Language ModelingWikiText-2 Mistral-7B
ΔPPL0.001
6
Language ModelingWikiText-2 (val)
Perplexity (BVS)28.45
5
Language ModelingWikiText-2 Shifted
Shifted PPL98.2
5
Language ModelingWikiText-2 raw-v1 (val)
Cross Entropy (CE)2.744
5
Language ModelingWikiText-2 context length 8192 (test)
Perplexity6.5
5
Language ModelingWikiText-2 Top 5% most uncertain tokens (test)
NLL5.12
5
Language ModelingWikiText-2 Top 5% most uncertain tokens (val)
NLL5.13
5
Showing 25 of 29 rows