Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Common Sense Reasoning on PIQA

91.89Accuracy

Llama-3.3-70B-Instruct

-3.675621.134745.94570.7553Jul 11, 2025Aug 29, 2025Oct 18, 2025Dec 7, 2025Jan 25, 2026Mar 16, 2026May 5, 2026
Updated 27d ago

Evaluation Results

MethodLinks
91.89-
2026.04
89.88-
2026.04
88.41-
2026.04
87.21-
2026.04
87.11-
2026.04
86.4-
2026.03
83-
2025.07
82.2-
2025.07
81.5-
2025.07
81.4-
2025.10
80.80.6
2026.04
80.6-
2026.04
80.4-
2026.04
80.3-
2026.04
80.3-
2025.10
80.2-0.3
2026.04
80.1-
2026.04
80.1-
2025.07
79.9-
2026.04
79.7-
2026.04
79.4-
2026.04
78.9-
2025.10
78.81.1
2025.07
78.8-
2025.10
78.60
2026.03
78.5-
2026.03
78.2-
2026.03
78.1-
2025.10
77.9-3.4
2026.03
77.9-
2026.03
77.45-
2026.03
77.1-
2026.03
77-
2026.04
76.8-
2025.10
762.8
2026.03
76-
2026.03
76-
2026.04
75.7-
2026.03
75.68-
2026.03
75.52-
2026.03
75.41-
2026.03
75.23-
2026.03
75.19-
2026.03
75.14-
2026.03
75.08-
2026.04
74.9-
2026.03
74.86-
2026.03
74.65-
2026.03
74.54-
2026.03
74.48-
2026.03
74.48-
2026.03
74.2-
2025.09
73.6-
2026.04
73.29-
2026.03
73-
2025.09
73-
2026.02
72.74-
2026.04
72.63-
2026.04
72.58-
2026.02
72.25-
2026.02
72.2-
2026.02
72.03-
2026.04
72.03-
2026.02
71.87-
2026.03
71-
2025.09
69.7-
2025.09
69.6-
2026.02
69.26-
2026.02
69.26-
2026.05
68.9-
2026.02
68.88-
2026.02
68.77-
2026.02
68.61-
2026.05
67.4-
2026.05
66.8-
2026.04
65.3-
2026.04
65.3-
2026.03
63.93-
2026.03
63.6-
2026.03
63.44-
2025.09
63.3-
2026.04
63.2-
2026.04
63.1-
2026.03
63-
2026.04
62.9-
2026.04
62.9-
2026.04
62.9-
2026.04
62.7-
2026.03
62.4-
2026.03
60.72-
2026.03
60.55-
59.6-
2026.03
58-
2026.05
56.1-
2026.03
56-
2026.03
49.72-
46.57-
2026.03
26.06-
2026.03
18.3-
2026.03
0-