Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Question Answering on 2WikiMQA (Passage Split)
Loading...
52.53
Score
Baseline
9.9836
21.0293
32.075
43.1207
Mar 5, 2026
Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
Score
Baseline
Model=ChatGLM, Setting...
2026.03
52.53
Baseline
Model=Qwen, Setting=Pa...
2026.03
51.61
Our + Reorder
Model=Qwen, Setting=Pa...
2026.03
50.58
Our
Model=Qwen, Setting=Pa...
2026.03
50.19
Our
Model=ChatGLM, Setting...
2026.03
48.9
Our + Reorder
Model=ChatGLM, Setting...
2026.03
46.66
Baseline
Model=LLaMA, Setting=P...
2026.03
45.88
EPIC (15%)
Model=ChatGLM, Setting...
2026.03
45.21
Our + Reorder
Model=LLaMA, Setting=P...
2026.03
44.17
CacheBlend
Model=Qwen, Setting=Pa...
2026.03
43.3
Our
Model=LLaMA, Setting=P...
2026.03
42.08
CacheBlend
Model=LLaMA, Setting=P...
2026.03
39.76
EPIC (15%)
Model=LLaMA, Setting=P...
2026.03
38.85
CacheBlend
Model=ChatGLM, Setting...
2026.03
37.57
EPIC (15%)
Model=Qwen, Setting=Pa...
2026.03
36.97
No Recompute
Model=ChatGLM, Setting...
2026.03
35.23
No Recompute
Model=LLaMA, Setting=P...
2026.03
30.66
No Recompute
Model=Qwen, Setting=Pa...
2026.03
11.62
Feedback
Search any
task
Search any
task