Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Reasoning on LongBench (Detailed Metrics)

16.6NQA

Nirvana

10.46412.05713.6515.243Oct 30, 2025
Updated 9d ago

Evaluation Results

MethodLinks
2025.10
16.612.82614.624.89.710.415.915.436.425.222.617.521.519.2
2025.10
14.811.825.61423.97.79.215.113.53321.222.916.520.917.9
2025.10
14.512.326.612.623.66.19.116.112.833.523.926.815.519.217.8
2025.10
14.11423.313.714.45.87.516.47.93022.42318.722.116.6
2025.10
1310.120.410.116.767.215.98.423.121.911.217.91914.6
2025.10
12.910.821.510.913.25.16.513.57.215.523.311.617.620.313.6
2025.10
12.71327.112.720.67.510.416.21340.522.727.919.922.118.4
2025.10
12.512.925.411.219.76.89.115.7112022.722.818.121.115.9
2025.10
12.110.719.110.7185.84.815.87.9191812.814.117.913.2
2025.10
11.89.31010.94.26.17.415.86.616.913.53.917.218.711
2025.10
11.111.318.611.815.16.76.714.57.41323.68.417.920.613.5
2025.10
10.712.119.111.315.765.215.19.21615.810.318.620.813.5