Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Long-context reasoning on BABILong
Loading...
14.1
Err (2k Context)
RoPE++EH
13.324
18.562
23.8
29.038
Dec 8, 2025
Err (2k Context)
Err (4k Context)
Err (8k Context)
Err (16k Context)
Err (32k Context)
Err (64k Context)
Avg Error Rate
Updated 4d ago
Evaluation Results
Method
Method
Links
Err (2k Context)
Err (4k Context)
Err (8k Context)
Err (16k Context)
Err (32k Context)
Err (64k Context)
Avg Error Rate
RoPE++EH
Model size=376M, Train...
2025.12
14.1
15.6
12.2
9.9
8.3
9.7
11.6
RoPE
Model size=376M, Train...
2025.12
17.7
16.1
9.1
9.4
5.9
7.8
11
RoPE++EC
Model size=376M, Train...
2025.12
19.8
19.8
16.1
15.8
12.3
12.8
16.1
RoPE++EH
Model size=776M, Train...
2025.12
31.9
26.5
18.6
16.2
11
12.2
19.4
RoPE++EC
Model size=776M, Train...
2025.12
32.4
29.9
24.4
24.5
18.6
14.8
24.1
RoPE
Model size=776M, Train...
2025.12
33.5
30.7
23.6
22
15.1
12.1
22.8
Feedback
Search any
task
Search any
task