Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Question Answering on NarrativeQA Passage Split
Loading...
32.64
Score
Baseline
15.896
20.243
24.59
28.937
Mar 5, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Baseline
Model=ChatGLM, Setting...
2026.03
32.64
Our + Reorder
Model=LLaMA, Setting=P...
2026.03
32.43
No Recompute
Model=ChatGLM, Setting...
2026.03
31.81
Our + Reorder
Model=ChatGLM, Setting...
2026.03
31.8
CacheBlend
Model=ChatGLM, Setting...
2026.03
31.79
EPIC (15%)
Model=ChatGLM, Setting...
2026.03
31.42
Our
Model=LLaMA, Setting=P...
2026.03
31.41
EPIC (15%)
Model=LLaMA, Setting=P...
2026.03
31.02
Our
Model=ChatGLM, Setting...
2026.03
31
CacheBlend
Model=LLaMA, Setting=P...
2026.03
30.95
No Recompute
Model=LLaMA, Setting=P...
2026.03
28.1
Our + Reorder
Model=Qwen, Setting=Pa...
2026.03
23.1
EPIC (15%)
Model=Qwen, Setting=Pa...
2026.03
22.91
Our
Model=Qwen, Setting=Pa...
2026.03
22.88
CacheBlend
Model=Qwen, Setting=Pa...
2026.03
21.97
No Recompute
Model=Qwen, Setting=Pa...
2026.03
20.78
Baseline
Model=LLaMA, Setting=P...
2026.03
18.62
Baseline
Model=Qwen, Setting=Pa...
2026.03
16.54
Feedback
Search any
task
Search any
task