Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Code Generation on HumanEval (Accuracy, Exit Position)

98.27Accuracy

SASFT

35.381251.708168.03584.3619May 19, 2025Jul 20, 2025Sep 20, 2025Nov 21, 2025Jan 22, 2026Mar 25, 2026May 26, 2026
Updated 1d ago

Evaluation Results

MethodLinks
2026.05
98.27-
2026.05
96.44-
2026.05
95.87-
2026.05
94.71-
2026.05
93.52-
2026.05
91.74-
2026.05
91.46-
2026.05
91.18-
2026.05
90.91-
2026.05
90.61-
2026.05
90.48-
2026.05
90.29-
2026.05
90.2-
2026.05
89.63-
2026.05
89.26-
2026.05
89.26-
2026.05
89.26-
2026.05
89.17-
2026.05
89.13-
2026.05
89.04-
2026.05
88.94-
2026.05
88.43-
2026.05
86-
2026.05
85.2-
2026.05
85.1-
2026.05
84.8-
2026.05
84-
2026.05
83.5-
2026.05
82.5-
2026.05
80.6-
2026.05
80.5-
2026.05
77.7-
2026.05
76.1-
2026.05
73.5-
2026.05
73.3-
2026.05
69.4-
2026.05
67.8-
2026.04
63.4-
2026.04
63.17.72
2026.04
62.82.55
2026.05
60.37-
2026.05
59.6-
2026.05
59-
2026.05
58.5-
2026.05
58.15-
2026.05
57.9-
2026.05
57.9-
2026.04
57.3-
2026.04
57.32.16
2026.05
56.71-
2026.05
56.1-
2026.05
55.84-
2026.04
55.52.16
2026.05
55.5-
2026.05
55.46-
2026.05
55-
2026.05
54.17-
2026.05
53.7-
2026.05
53.7-
2026.05
53.62-
2026.05
53.5-
2026.05
53.33-
2026.05
53-
2026.05
53-
2026.05
53-
2026.05
52.53-
2026.05
52.4-
2026.05
52.4-
2026.05
51.8-
2026.05
51.8-
2026.05
51.67-
2026.05
51.2-
2026.05
51.2-
2026.05
51.1-
2026.05
50-
2026.05
49.17-
2026.05
48.33-
2026.05
48.2-
2025.10
47.6-
2025.10
47.4-
2025.10
47-
2025.10
45.1-
2025.10
45-
2025.10
44.8-
2026.05
43.9-
2026.05
43.3-
2026.05
43.3-
2026.05
42.7-
2026.05
41.5-
2026.05
40.9-
2026.05
40.9-
2026.05
40.9-
2026.05
40.2-
2026.05
39.6-
2026.05
39.6-
2026.05
39.6-
2025.05
39-
2026.05
39-
2026.05
37.8-
2026.05
37.8-
Showing 100 of 115 rows