Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Commonsense Reasoning on OpenBookQA

91Accuracy

Fine-tuning

23.81641.25858.776.142Jun 10, 2023Nov 29, 2023May 19, 2024Nov 8, 2024Apr 29, 2025Oct 18, 2025Apr 9, 2026
Updated 8d ago

Evaluation Results

MethodLinks
91
2026.04
90.7
2026.04
87.5
2023.06
87
2023.06
86.1
2023.06
85
2023.06
85
2023.06
83.5
2023.06
82.6
2026.04
82.6
2026.04
81.2
2026.04
78.9
2026.04
77.2
2023.06
76.6
2023.06
74.8
2023.06
73.9
2026.04
73.3
2023.06
73
2026.04
71.9
2023.06
68.4
2025.12
68
2025.12
66
2025.12
66
2026.04
65.5
2025.12
64
2025.12
63
2025.12
62
2025.12
62
2025.12
59
2025.12
59
2025.12
58
2025.12
57
2025.12
52
2025.12
51
2025.12
51
2025.12
48
2025.12
46
2026.02
46
2026.02
45
2023.06
44.7
2025.12
44
2025.12
44
2026.02
43
2026.03
43
2026.03
42.6
2026.03
42.6
2026.03
42.4
2026.02
42
2026.03
42
2026.03
42
2026.03
41.8
2026.03
41.6
2026.03
41.2
2026.03
40.8
2026.03
40.6
2026.03
38.6
2026.03
38.4
2025.11
38.2
2026.03
37.4
2026.03
36.6
2025.12
36
2025.11
35.6
2025.12
35
2025.12
32
2025.12
31
2026.03
29.6
2026.03
28.8
2026.03
28
2026.03
27.6
2026.03
26.6
2026.03
26.4