Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-context Question Answering on HotpotQA Fixed Chunk 2048
Loading...
60.03
QA Score
Baseline
45.782
49.481
53.18
56.879
Mar 5, 2026
QA Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
QA Score
Baseline
Model=ChatGLM, Setting...
2026.03
60.03
Our
Model=Qwen, Setting=Fi...
2026.03
59.67
Baseline
Model=Qwen, Setting=Fi...
2026.03
59.22
Our + Reorder
Model=ChatGLM, Setting...
2026.03
58.2
Our
Model=ChatGLM, Setting...
2026.03
57.39
Baseline
Model=LLaMA, Setting=F...
2026.03
54.1
EPIC (15%)
Model=ChatGLM, Setting...
2026.03
53.62
CacheBlend
Model=Qwen, Setting=Fi...
2026.03
53.52
EPIC (15%)
Model=Qwen, Setting=Fi...
2026.03
52.84
CacheBlend
Model=ChatGLM, Setting...
2026.03
51.77
Our
Model=LLaMA, Setting=F...
2026.03
51.5
Our + Reorder
Model=Qwen, Setting=Fi...
2026.03
50.53
Our + Reorder
Model=LLaMA, Setting=F...
2026.03
50.53
No Recompute
Model=ChatGLM, Setting...
2026.03
50.24
EPIC (15%)
Model=LLaMA, Setting=F...
2026.03
47.55
CacheBlend
Model=LLaMA, Setting=F...
2026.03
47.2
No Recompute
Model=LLaMA, Setting=F...
2026.03
46.71
No Recompute
Model=Qwen, Setting=Fi...
2026.03
46.33
Feedback
Search any
task
Search any
task