Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Long-Document Question Answering on LongBench
Loading...
30.4
NarQA Score
Qwen-LiteCoST
16.4016
20.0358
23.67
27.3042
Mar 31, 2026
NarQA Score
Qasper Score
HotpotQA Score
2Wiki Score
Updated 18d ago
Evaluation Results
Method
Method
Links
NarQA Score
Qasper Score
HotpotQA Score
2Wiki Score
Qwen-LiteCoST
Base model=Qwen2-7B, P...
2026.03
30.4
44.64
68.39
65.73
GPT-4o
2026.03
28.68
43.39
67.68
68.29
LLaMA-LiteCoST
Base model=LLaMA-3.2-3...
2026.03
27.24
41.37
66.86
67.52
GPT-4o-mini
2026.03
24.38
40.28
65.03
65.15
Qwen2-7B
Parameters=7B
2026.03
19.49
35.67
45.06
41.4
LLaMA-3.2-3B
Parameters=3B
2026.03
16.94
34.46
54.92
51.82
Feedback
Search any
task
Search any
task