Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Reasoning on CRUXEval (CoT Metrics)

98.8Input-CoT Accuracy

Gemini-3-Pro-preview

11.85634.4285779.572Jan 22, 2026Jan 31, 2026Feb 9, 2026Feb 18, 2026Feb 27, 2026Mar 8, 2026Mar 17, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.03
98.899.1
2026.03
98.498
2026.03
96.597.6
2026.03
96.296.2
2026.03
93.587
2026.03
92.286.2
2026.03
91.185.5
2026.03
87.494
2026.03
87.190.4
2026.03
86.889.5
2026.03
82.194.2
2026.03
80.590.6
2026.03
78.884
2026.03
76.980.5
2026.03
76.575.2
2026.03
75.679.2
2026.01
73.876.9
2026.03
71.481.1
2026.03
70.871.1
69.579.5
2026.03
66.966
65.865.9
2026.03
65.681.2
63.367.1
62.169
2026.03
6266.6
2026.03
6289.5
2026.01
61.363.5
2026.01
60.666.4
2026.03
57.681.5
2026.03
57.156.2
2026.01
56.557.8
2026.01
56.556
53.860
2026.01
53.446.1
5352.9
2026.03
52.657.6
2026.01
5254.8
50.648.8
2026.01
49.443.9
2026.01
47.555.6
47.350.6
2026.01
46.147.6
2026.03
45.854.2
45.560.9
2026.01
4438.8
2026.01
43.343.9
42.645.1
2026.03
42.565.1
2026.01
39.943
2026.01
39.535.1
2026.01
3941
2026.01
36.136.2
2026.01
35.637.8
2026.03
3364.2
2026.03
15.246.9