Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Reasoning on BBH (test)

81.8Accuracy

ICL+FT

25.1239.83554.5569.265Dec 22, 2025
Updated 2d ago

Evaluation Results

MethodLinks
2025.12
81.8----
2025.12
78.7----
2025.12
76.6----
2025.12
72.8----
2025.12
72.6----
2025.12
72.3----
2025.12
69.7----
2025.12
68.7----
2025.12
67.5----
2025.12
64.6----
2025.12
64.2----
2025.12
62.6----
2025.12
57.1----
2025.12
56.6----
2025.12
55.3----
2025.12
52.8----
2025.12
50.7----
2025.12
37.6----
2025.12
37.2----
2025.12
27.3----
2023.06
-60.864.156.445.9
2023.06
-55.656.562.136.7
2023.06
-71.156.961.144.4
2023.06
-68.567.332.145.5
2023.06
-65.556.923.545.6
2023.06
-91.968.555.747.5
2023.06
-647479.268.5
2023.06
-88.475.779.362.8
2023.06
-93.375.580.966.4
2023.06
-95.277.176.769.1