Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Long-context Question Answering on HotpotQA

65.49Mean Score

Full Model

-2.26615.324532.91550.5055Jan 28, 2026Jan 29, 2026Jan 30, 2026Jan 31, 2026Feb 1, 2026Feb 2, 2026Feb 3, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.02
65.49----
2026.02
63.19----
2026.02
63.13----
2026.02
59.62----
2026.01
59.6----
2026.02
59.11----
2026.02
58.78----
2026.01
58.4----
2026.02
55.66----
2026.01
54.8----
2026.01
54.1----
2026.02
53.48----
2026.02
53.03----
48.3----
2026.02
38.33----
2026.01
34.7----
2026.02
16.37----
2026.02
8.71----
2026.02
4.18----
2026.02
3.81----
2026.02
0.34----
2024.09
-74.580.892-
2024.09
-81.375.3108-
2024.09
-76.376.5100-
2024.09
-68.571.596-
2024.09
-6464.599-
2024.09
-71.375.395-
2024.09
-7777.3100-
2024.09
-70.869103-
2024.09
-71.867.5106-
2026.02
----28.48
2026.02
----30.16
2026.02
----29.03
2026.02
----39.49
2026.02
----38
2026.02
----36.03
2026.02
----43.89
2026.02
----38.74
2026.02
----32.32
2026.02
----39.56