Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Long-context Question Answering on LongBench

59.71HotPotQA Accuracy

Full Attention

26.01434.76243.5152.258Oct 7, 2025Nov 7, 2025Dec 8, 2025Jan 9, 2026Feb 9, 2026Mar 12, 2026Apr 13, 2026
Updated 7d ago

Evaluation Results

MethodLinks
59.71---64.93-54.36--
2025.10
55.95-29.1747.7954.4641.3346.2730.5725.1
2025.10
55.75-29.2546.8754.2541.2446.5131.0724.98
2025.10
55.74---61.37-46.04--
2025.10
55.59-31.3443.0553.2140.0444.3327.5825.15
2025.10
52.57-27.1543.9653.7938.6445.523.1324.38
2025.10
52.12---62.43-41.4--
2025.10
51.98-30.0144.4852.9539.3547.2823.2925.45
2026.04
49.3585.7219.0618.2549.4944.37---
2026.04
47.8485.9318.4414.748.4343.07---
2026.04
47.7783.2917.3714.1249.942.49---
2026.04
45.6883.1615.6513.1748.3341.2---
2025.10
45.29---59.7-38.34--
44.63---49.44-37.7--
2025.10
44.11---57.34-41.08--
2026.04
43.3680.3218.6915.6346.7440.95---
2026.04
32.5980.921119.2842.337.22---
2026.04
32.5778.368.7318.2143.236.21---
2026.04
32.578.439.819.3643.0836.63---
2026.04
27.3170.46.4417.2941.432.57---