Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-Context Language Modeling on RULER (4k and 8k Component Evaluation)
Loading...
10.21
CWE (Context Length 4K)
RoPE
7.87
8.4775
9.085
9.6925
May 11, 2026
CWE (Context Length 4K)
CWE (Context Length 8K)
NIAH (Key 1) (Context Length 4K)
NIAH (Key 1) (Context Length 8K)
QA (Key 2) (Context Length 4K)
QA (Key 2) (Context Length 8K)
VT (Context Length 4K)
VT (Context Length 8K)
Updated 21d ago
Evaluation Results
Method
Method
Links
CWE (Context Length 4K)
CWE (Context Length 8K)
NIAH (Key 1) (Context Length 4K)
NIAH (Key 1) (Context Length 8K)
QA (Key 2) (Context Length 4K)
QA (Key 2) (Context Length 8K)
VT (Context Length 4K)
VT (Context Length 8K)
RoPE
Model=nanoGPT-44M
2026.05
10.21
11.52
12.19
12.59
8.45
8.95
10.43
11.11
p-RoPE
Model=nanoGPT-44M
2026.05
9.97
10.8
11.98
12.06
8.63
8.99
9.87
10.68
GAPE
Model=nanoGPT-44M
2026.05
9.32
9.48
10.59
11.36
8.14
8.8
9.18
10.4
RoPE
Model=GPT2-124M
2026.05
8.21
9.13
9.51
10.57
6.45
7.54
9.9
10.7
p-RoPE
Model=GPT2-124M
2026.05
8.1
8.82
9.26
10.05
6.32
6.88
8.5
9.53
GAPE
Model=GPT2-124M
2026.05
7.96
8.83
8.94
9.87
6.39
6.87
8.31
8.98
Feedback
Search any
task
Search any
task