Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common Sense Reasoning on BoolQ

92.4Accuracy

Fine-tuning

69.395275.367681.3487.3124Mar 29, 2022Dec 1, 2022Aug 5, 2023Apr 9, 2024Dec 12, 2024Aug 16, 2025Apr 21, 2026
Updated 23d ago

Evaluation Results

MethodLinks
92.4
2022.03
91.4
2023.02
88
2023.02
85.3
2026.03
85.23
2023.02
84.8
2026.03
84.65
2023.02
83.9
2022.03
83.7
2023.02
83.7
2023.09
83.5
2023.02
83.1
2026.02
83
2026.01
82.14
2026.03
82.11
2026.01
82.07
2026.02
82
2025.03
81.1
2026.04
80.95
2026.01
80.61
2026.03
80.55
2026.02
80.3
2026.02
80
2022.03
79.3
2023.02
79.3
2025.03
79
2026.03
78.62
2022.03
78.2
2026.01
78.2
2026.03
78.13
2023.02
78.1
2026.03
77.97
2023.09
77.9
2026.02
77.1
2026.03
76.94
2026.03
76.75
2026.01
76.73
2023.02
76.5
2026.03
76.5
2026.03
76.1
2026.03
75.91
2023.09
75.8
2026.03
75.69
2026.02
75.53
2025.05
75.5
2025.05
75.35
2025.05
75.26
2026.03
75.22
2026.02
75
2025.05
74.92
2025.02
74.79
2025.03
74.7
2026.03
74.7
2026.02
74.62
2025.05
74.6
2026.01
74.56
2025.05
74.5
2025.05
74.19
2025.09
74.19
2026.03
74.01
2026.03
74
2023.09
73.9
2026.01
73.79
2025.09
73.58
2025.05
73.36
2026.03
73.3
2023.09
73.2
2026.03
73.18
2025.05
73.1
2026.04
73.06
2025.09
73
2026.03
72.95
2026.01
72.84
2025.09
72.83
2023.09
72.8
2025.02
72.75
2025.09
72.63
2026.03
72.58
2025.05
72.54
2025.09
72.28
2025.05
72.08
2026.01
72.05
2025.05
72
2025.05
71.8
2025.03
71.5
2025.05
71.47
2025.05
71.38
2026.02
71.25
2025.05
71.16
2026.04
71.07
2025.05
71.04
2026.01
70.95
2025.05
70.8
2025.02
70.77
2025.05
70.73
2025.05
70.52
2026.01
70.49
2026.03
70.4
2026.01
70.3
2026.04
70.28
Showing 100 of 240 rows