Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on BBEH

78.8Accuracy

Gemini 3-Pro

-0.7619.89540.5561.205Dec 2, 2025Dec 20, 2025Jan 7, 2026Jan 26, 2026Feb 13, 2026Mar 3, 2026Mar 22, 2026
Updated 24d ago

Evaluation Results

MethodLinks
78.8-
69-
68.8-
67.04-
2026.02
66.63-
2026.03
15.3-
2026.03
15.2-
2025.12
14.9-
2026.03
14.9-
2026.03
14-
2026.03
12.9-
2025.12
12.3-
2025.12
12.3-
2025.12
12.2-
2025.12
11.8-
2025.12
11.8-
2026.03
11.8-
2026.03
11.8-
2026.03
11.6-
2025.12
11.3-
2025.12
11.2-
2026.03
11.2-
2025.12
10.8-
2025.12
10.8-
2025.12
10.5-
2025.12
10.4-
2026.03
10.4-
2026.03
10.4-
2026.03
10.1-
2026.03
9.9-
2026.03
9.1-
2025.12
8.3-
2026.03
8.2-
2025.12
8.1-
2026.03
5.4-
2026.03
4.8-
2026.03
4.4-
2026.03
4.1-
2026.03
2.3-
2026.01
-44.5
2026.01
-33.8
2026.01
-9.9
2026.01
-12.9
2026.01
-13.1
2026.01
-22.1
2026.01
-15.6
2026.01
-18
2026.01
-27.4
2026.01
-34.1