Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Accuracy Evaluation on BBH General Reasoning

88.7BBH General Reasoning Accuracy

Kimi-K2 Base

15.32834.376553.42572.4735Nov 28, 2025Dec 8, 2025Dec 19, 2025Dec 30, 2025Jan 10, 2026Jan 21, 2026Feb 1, 2026
Updated 3d ago

Evaluation Results

MethodLinks
2026.01
88.7
2026.01
88.7
2026.01
88.5
2026.01
88.2
2026.02
88
2026.02
86
2026.02
84
2026.02
83.5
2026.02
83
2026.02
81
2026.02
81
2026.02
79
2026.02
76.5
2026.02
72
2026.02
69.5
2026.02
69.5
2026.02
67.5
2026.02
65
2026.02
65
2026.02
61
2025.11
60.11
2026.02
59.5
2026.02
58
2026.02
57.5
2026.02
57.5
2026.02
56.5
2026.02
56.5
2026.02
56
2026.02
55.5
2026.02
55
2026.02
55
2026.02
54
2026.02
53.5
2026.02
53
2026.02
52
2025.11
51.7
2026.02
51
2025.11
50.34
2026.02
46.5
2025.11
41.28
2026.02
36.5
2025.11
32.27
2025.11
18.15