Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Reasoning on BIG-Bench Hard

91.1Accuracy

Qwen 3 VL 32B Think

5.40427.65249.972.148May 17, 2023Oct 26, 2023Apr 5, 2024Sep 14, 2024Feb 23, 2025Aug 4, 2025Jan 13, 2026
Updated 2d ago

Evaluation Results

MethodLinks
2025.12
91.1---
2025.12
90.6---
2025.12
89.8---
2025.12
89.7---
2025.12
89.1---
2025.12
89---
2025.12
88.8---
2025.12
88.6---
2025.12
87.6---
86.8---
2025.12
86.6---
86.2---
2025.12
85.6---
2025.12
84.4---
2025.12
84.1---
2025.12
84---
2025.12
83.7---
2025.12
82.4---
2026.01
82.33,108.44--
2025.12
82.1---
2026.01
81.96,753.47--
81.3---
2025.12
80.9---
2025.12
80.4---
2023.05
78.1---
77.1---
2025.12
73.7---
73.5---
2026.01
72.6---
2025.12
71.2---
2025.12
69.3---
2023.05
69.1---
2025.12
69---
2025.12
68.8---
2026.01
68.72,340.05--
2023.05
68.1---
2026.01
67.44,988.84--
2025.12
66---
2023.07
65.7---
2025.12
65.6---
2023.05
65.2---
2023.05
64.9---
2023.05
64.6---
2023.05
62.4---
2023.05
62.4---
2025.12
61.2---
2023.05
59.3---
2024.09
58.9---
2025.12
57---
2024.09
56.5---
2026.01
54.5---
2023.07
52.3---
2023.07
51.2---
2025.12
51---
2023.05
49.2---
2025.12
43.8---
2026.01
42.44,082.96--
2025.12
42.2---
2026.01
41.5---
2026.01
39.22,280.52--
2024.09
39.1---
2024.09
29.3---
2024.09
23.5---
2024.09
23.4---
2024.09
21.5---
2024.09
20.8---
2024.09
19.8---
2024.09
8.7---
2026.02
--79.6883.25
2026.02
--78.3982.58
2026.02
--76.7681.87
2026.02
--57.7474.24
2026.02
--78.8782.6
2026.02
--80.5483.26
2026.02
--81.0483.96