Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context language understanding on RULER 128k
Loading...
49.11
Average Score
Vanilla
16.0484
24.6317
33.215
41.7983
Dec 3, 2025
Average Score
CWE Score
FWE Score
NIAH Multi-Key 1 Score
NIAH Multi-Key 2 Score
NIAH Multi-Query Score
NIAH Multi-Value Score
NIAH Single 1 Score
NIAH Single 2 Score
NIAH Single 3 Score
QA HotpotQA Score
QA SQuAD Score
VT Score
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Score
CWE Score
FWE Score
NIAH Multi-Key 1 Score
NIAH Multi-Key 2 Score
NIAH Multi-Query Score
NIAH Multi-Value Score
NIAH Single 1 Score
NIAH Single 2 Score
NIAH Single 3 Score
QA HotpotQA Score
QA SQuAD Score
VT Score
Vanilla
Context Window=128k, M...
2025.12
49.11
89
78.67
94
63
16
29
27
90
82
28
21
12.8
FusedKV-Lite
Context Window=128k, M...
2025.12
42.31
78.3
75.67
91
18
0.75
28.5
98
97
18
20
19
5.8
FusedKV
Context Window=128k, M...
2025.12
42
71.3
64
85
4
5.25
40.25
42
90
87
27
16
10.2
YOCO
Context Window=128k, M...
2025.12
17.32
25
76
8
31
1.5
0
47
0
0
21
14
1.6
Feedback
Search any
task
Search any
task