Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on BIG-bench

81.6Accuracy (General)

POES

40.93651.49362.0572.607Apr 13, 2026
Updated 4d ago

Evaluation Results

MethodLinks
2026.04
81.6---
2026.04
75.8---
2026.04
74.8---
2026.04
74.6---
2026.04
71.9---
2026.04
71.3---
2026.04
70.9---
2026.04
63.3---
2026.04
61.2---
2026.04
59.3---
2026.04
56.6---
2026.04
55.8---
2026.04
55---
2026.04
54.6---
2026.04
53.8---
2026.04
53.3---
2026.04
52.1---
2026.04
52---
2026.04
51.7---
2026.04
51.2---
2026.04
51.2---
2026.04
50.8---
2026.04
50.1---
2026.04
49.6---
2026.04
49.6---
2026.04
49.5---
2026.04
48.6---
2026.04
48.2---
2026.04
46.7---
2026.04
46.7---
2026.04
46.5---
2026.04
46.5---
2026.04
46.4---
2026.04
46.4---
2026.04
44.6---
2026.04
42.5---
2025.05
-61.275.614.4
2025.05
-61.274.413.2
2025.05
-74.6--
2025.05
-61.278.417.2
2025.05
-61.2675.8
2025.05
-61.274.813.6
2025.05
-61.27513.8
2025.05
-63.4728.6
2025.05
-63.475.211.8
2025.05
-38.2--
2025.05
-38.252.414.2
2025.05
-61.6--
2025.05
-38.271.233
2025.05
-38.245.47.2
2025.05
-38.26324.8
2025.05
-38.259.621.4
2025.05
-48--
2025.05
-486719
2025.05
-4864.416.4
2025.05
-36.6--
2025.05
-36.643.87.2
2025.05
-37.8--
2025.05
-36.651.615
2025.05
-36.671.134.5
2025.05
-36.650.213.6
2025.05
-36.648.411.8
2025.05
-54.4--
2025.05
-54.467.212.8
2025.05
-54.46813.6