Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context reasoning on BABILong
Loading...
14.1
Err (2k Context)
RoPE++EH
13.324
18.562
23.8
29.038
Dec 8, 2025
Err (2k Context)
Err (4k Context)
Err (8k Context)
Err (16k Context)
Err (32k Context)
Err (64k Context)
Avg Error Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Err (2k Context)
Err (4k Context)
Err (8k Context)
Err (16k Context)
Err (32k Context)
Err (64k Context)
Avg Error Rate
RoPE++EH
Model size=376M, Train...
2025.12
14.1
15.6
12.2
9.9
8.3
9.7
11.6
RoPE
Model size=376M, Train...
2025.12
17.7
16.1
9.1
9.4
5.9
7.8
11
RoPE++EC
Model size=376M, Train...
2025.12
19.8
19.8
16.1
15.8
12.3
12.8
16.1
RoPE++EH
Model size=776M, Train...
2025.12
31.9
26.5
18.6
16.2
11
12.2
19.4
RoPE++EC
Model size=776M, Train...
2025.12
32.4
29.9
24.4
24.5
18.6
14.8
24.1
RoPE
Model size=776M, Train...
2025.12
33.5
30.7
23.6
22
15.1
12.1
22.8
Feedback
Search any
task
Search any
task