Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common Sense Reasoning on BoolQ

92.4Accuracy

Fine-tuning

68.79274.92181.0587.179Mar 29, 2022Nov 25, 2022Jul 25, 2023Mar 23, 2024Nov 19, 2024Jul 19, 2025Mar 18, 2026
Updated 19d ago

Evaluation Results

MethodLinks
92.4
2022.03
91.4
2023.02
88
2023.02
85.3
2026.03
85.23
2023.02
84.8
2026.03
84.65
2023.02
83.9
2022.03
83.7
2023.02
83.7
2023.09
83.5
2023.02
83.1
2026.02
83
2026.01
82.14
2026.03
82.11
2026.01
82.07
2026.02
82
2025.03
81.1
2026.01
80.61
2026.03
80.55
2026.02
80.3
2026.02
80
2022.03
79.3
2023.02
79.3
2025.03
79
2026.03
78.62
2022.03
78.2
2026.01
78.2
2026.03
78.13
2023.02
78.1
2026.03
77.97
2023.09
77.9
2026.02
77.1
2026.03
76.94
2026.03
76.75
2026.01
76.73
2023.02
76.5
2026.03
76.5
2026.03
76.1
2026.03
75.91
2023.09
75.8
2026.03
75.69
2026.02
75.53
2025.05
75.5
2025.05
75.35
2025.05
75.26
2026.03
75.22
2026.02
75
2025.05
74.92
2025.02
74.79
2025.03
74.7
2026.03
74.7
2026.02
74.62
2025.05
74.6
2026.01
74.56
2025.05
74.5
2025.05
74.19
2026.03
74.01
2026.03
74
2023.09
73.9
2026.01
73.79
2025.05
73.36
2026.03
73.3
2023.09
73.2
2026.03
73.18
2025.05
73.1
2026.03
72.95
2026.01
72.84
2023.09
72.8
2025.02
72.75
2026.03
72.58
2025.05
72.54
2025.05
72.08
2026.01
72.05
2025.05
72
2025.05
71.8
2025.03
71.5
2025.05
71.47
2025.05
71.38
2026.02
71.25
2025.05
71.16
2025.05
71.04
2026.01
70.95
2025.05
70.8
2025.02
70.77
2025.05
70.73
2025.05
70.52
2026.01
70.49
2026.03
70.4
2026.01
70.3
2025.05
70.18
2025.05
70
2023.06
69.9
2026.01
69.88
2026.01
69.88
2025.05
69.85
2026.03
69.85
2025.05
69.82
2025.05
69.8
2025.05
69.7
Showing 100 of 212 rows