Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Question Answering on HotpotQA (test)

67.5Ans EM

AISO

19.34831.84944.3556.851Sep 17, 2019Oct 15, 2020Nov 13, 2021Dec 12, 2022Jan 10, 2024Feb 7, 2025Mar 9, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2023.05
67.580.561.28644.972--
2023.05
67.480.161.385.345.771.7--
2023.05
6779.559.484.344.470.8--
2023.05
66.379.957.282.643.169.8--
64.877.856.181.84167.8--
2023.05
62.575.95178.93663.9--
2023.05
62.375.357.580.941.866.6--
2022.05
62.375.3------
2023.05
607349.176.435.461.2--
2022.05
6073------
2023.05
59.771.45177.437.962.3--
2022.05
58.171.1------
51.664.140.971.426.151.3--
2022.12
51.462.9------
2022.12
49.360.7------
2019.09
45.357.338.770.825.147.6--
45.357.338.770.825.147.6--
2026.03
43.2-------
2022.05
42.151.7------
37.949.830.764.61839.1--
2026.01
37.419.72-----36.6
37.148.922.857.712.434.9--
2023.05
37.148.922.857.712.434.9--
2022.12
37.148.4------
2022.12
35.1-------
2026.01
31.416.41-----29.4
2019.09
30.640.316.747.310.927--
2023.05
30.640.316.747.310.927--
2026.01
30.46-----27.8
2022.12
30.3-------
2026.01
26.634.92-----33.4
2026.01
25.232.45-----31.18
2026.03
24.6-------
2026.01
24.431.77-----31.2
2432.93.937.71.916.2--
2026.03
22.6-------
2026.01
21.213.39-----23.2
2020.07
-82.2-88.5-74.2--
2020.07
-81.6-88.7-73.9--
2020.07
-81.2-88.3-73.2--
2020.07
-81.2-89.1-73.6--
2022.03
------7.2-
2022.03
------9.13-
2022.03
------11.57-
2022.03
------22.94-
2022.03
------23.36-
2022.12
-53.5------
-------29.43
2025.02
-------24.59
2025.02
-------25.36
2025.02
-------24.89
2025.02
-------32.21
2026.01
-------57.4
2026.01
-------45.1
2026.01
-------21.8
2026.01
-------73.3
2026.01
-------76.3
2026.01
-------63.7
2026.01
-------77.5
2026.01
-------21.5
2026.01
-------21.9
2026.01
-------27.1
2026.01
-------68.3
2026.01
-------56.3
2026.01
-------67.6
2026.01
-------70.1
2026.03
-------60.36
2026.03
-------67.62
2026.03
-------65.61
2026.03
-------68.76
2026.03
-------69.92
2026.03
-------73.42
2026.03
-------74.37
2026.03
-------75.43