Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Logical Reasoning on BBH (test)

88.29Top@1 Accuracy

FAA

41.833253.894165.95578.0159Dec 26, 2025Dec 28, 2025Dec 30, 2025Jan 1, 2026Jan 3, 2026Jan 5, 2026Jan 7, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2025.12
88.29--
2025.12
88.27--
2025.12
88.21--
2025.12
88.17--
2025.12
88.07--
2026.01
85.2592.341,287
2026.01
83.8458.65817
2026.01
73.3369.031,445
2026.01
67.6838.24796
2025.12
67.1--
2025.12
67.09--
2025.12
67.05--
2025.12
66.89--
2025.12
66.74--
2026.01
65.8644.651,005
2026.01
63.0324.46541
2026.01
61.2116.08362
2025.12
56.89--
2025.12
56.87--
2025.12
56.85--
2025.12
56.82--
2025.12
56.81--
2025.12
43.71--
2025.12
43.68--
2025.12
43.67--
2025.12
43.65--
2025.12
43.62--