Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Question Answering on LongBench HotpotQA (F1)
Loading...
63.58
F1 Score
CORE
5.9848
20.9374
35.89
50.8426
Aug 24, 2025
Sep 21, 2025
Oct 20, 2025
Nov 18, 2025
Dec 17, 2025
Jan 15, 2026
Feb 13, 2026
F1 Score
Token Count
Updated 4d ago
Evaluation Results
Method
Method
Links
F1 Score
Token Count
CORE
Generator LLM=Qwen2.5-...
2025.08
63.58
126
Full Context
Generator LLM=Qwen2.5-...
2025.08
62.22
9,151
LongLLMLingua
Generator LLM=Qwen2.5-...
2025.08
57.69
907
No Context
Generator LLM=Qwen2.5-...
2025.08
51.13
0
Random
Context Budget (B)=2048
2026.02
28.4
-
CurvPrune
Context Budget (B)=2048
2026.02
27.8
-
BM25
Context Budget (B)=2048
2026.02
27.3
-
BM25+Tex
Context Budget (B)=2048
2026.02
27
-
Sel. Ctx
Context Budget (B)=2048
2026.02
20.1
-
LLMLingua
Context Budget (B)=2048
2026.02
18.8
-
Recency
Context Budget (B)=2048
2026.02
12.2
-
Head+Tail
Context Budget (B)=2048
2026.02
8.2
-
Feedback
Search any
task
Search any
task