Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

General Reasoning on BIG-bench

74.6Accuracy @ t1

SFT based

35.0845.3455.665.86May 22, 2025
Updated 3d ago

Evaluation Results

MethodLinks
2025.05
74.6--
2025.05
63.4728.6
2025.05
63.475.211.8
2025.05
61.6--
2025.05
61.275.614.4
2025.05
61.274.413.2
2025.05
61.278.417.2
2025.05
61.2675.8
2025.05
61.274.813.6
2025.05
61.27513.8
2025.05
54.4--
2025.05
54.467.212.8
2025.05
54.46813.6
2025.05
48--
2025.05
486719
2025.05
4864.416.4
2025.05
38.2--
2025.05
38.252.414.2
2025.05
38.271.233
2025.05
38.245.47.2
2025.05
38.26324.8
2025.05
38.259.621.4
2025.05
37.8--
2025.05
36.6--
2025.05
36.643.87.2
2025.05
36.651.615
2025.05
36.671.134.5
2025.05
36.650.213.6
2025.05
36.648.411.8