Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

General Reasoning on BBEH

78.8Accuracy

Gemini 3-Pro

-0.7619.89540.5561.205Nov 13, 2025Dec 13, 2025Jan 13, 2026Feb 13, 2026Mar 15, 2026Apr 15, 2026May 16, 2026
Updated 15d ago

Evaluation Results

MethodLinks
78.8-
69-
68.8-
67.04-
2026.02
66.63-
2026.03
15.3-
2026.03
15.2-
2025.12
14.9-
2026.03
14.9-
2026.05
14.9-
2026.03
14-
2026.03
12.9-
2026.05
12.55-
2025.12
12.3-
2025.12
12.3-
2026.05
12.3-
2025.12
12.2-
2026.05
12.19-
2025.11
11.86-
2025.12
11.8-
2025.12
11.8-
2026.03
11.8-
2026.03
11.8-
2026.05
11.75-
2026.05
11.75-
2025.11
11.68-
2026.03
11.6-
2026.05
11.59-
2025.11
11.5-
2025.11
11.35-
2025.12
11.3-
2025.11
11.22-
2025.12
11.2-
2026.03
11.2-
2025.11
11.19-
2025.11
11.13-
2025.12
10.8-
2025.12
10.8-
2026.05
10.8-
2025.11
10.75-
2026.05
10.75-
2026.05
10.6-
2025.12
10.5-
2025.12
10.4-
2026.03
10.4-
2026.03
10.4-
2026.05
10.31-
2026.05
10.22-
2026.03
10.1-
2025.11
9.93-
2026.03
9.9-
2026.03
9.1-
2025.12
8.3-
2026.05
8.3-
2026.05
8.25-
2026.05
8.23-
2026.03
8.2-
2025.12
8.1-
2025.11
7.57-
2026.03
5.4-
2026.03
4.8-
2026.03
4.4-
2026.03
4.1-
2026.03
2.3-
2026.01
-44.5
2026.01
-33.8
2026.01
-9.9
2026.01
-12.9
2026.01
-13.1
2026.01
-22.1
2026.01
-15.6
2026.01
-18
2026.01
-27.4
2026.01
-34.1