Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Commonsense reasoning on PIQA 1.0 (test)

82.21Accuracy

Mistral-7B + Uniform

48.711657.408366.10574.8017Feb 12, 2024Apr 13, 2024Jun 14, 2024Aug 15, 2024Oct 15, 2024Dec 16, 2024Feb 16, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2024.02
82.21
2024.02
82.21
2024.02
82.1
2024.02
81.99
2024.02
81.94
2024.02
79.54
2024.02
79.43
2024.02
79
2024.02
78.84
2024.02
78.18
2024.02
77.53
2024.02
77.42
2024.02
77.15
2024.02
76.71
2024.02
70.67
2025.02
60
2025.02
58
2025.02
58
2025.02
58
2025.02
58
2025.02
57
2025.02
57
2025.02
57
2025.02
56
2025.02
56
2025.02
56
2025.02
55
2025.02
55
2025.02
54
2025.02
54
2025.02
54
2025.02
54
2025.02
54
2025.02
54
2025.02
53
2025.02
53
2025.02
53
2025.02
53
2025.02
53
2025.02
53
2025.02
53
2025.02
52
2025.02
52
2025.02
52
2025.02
52
2025.02
52
2025.02
51
2025.02
50