Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Aggregate Model Performance on Combined Benchmark Suite

100Average Score

Baseline

4.985629.652854.3278.9872Jan 12, 2026Jan 24, 2026Feb 5, 2026Feb 17, 2026Mar 1, 2026Mar 13, 2026Mar 26, 2026
Updated 23d ago

Evaluation Results

MethodLinks
2026.02
100
2026.02
94.42
2026.02
85.71
2026.03
75
2026.03
71.5
2026.03
70.7
2026.03
70
2026.03
68.2
2026.03
67.7
2026.03
65.8
2026.03
64.1
2026.03
63.5
2026.03
63.1
2026.03
63.1
2026.03
62.8
2026.03
62.5
2026.03
61.6
2026.03
61.4
2026.03
60.5
2026.03
59.5
2026.03
57.5
2026.03
57.2
2026.01
54.79
2026.01
53.56
2026.01
53.43
2026.01
52.99
2026.01
52.71
2026.01
51.23
2026.01
50.89
2026.01
50.23
2026.01
49.84
2026.01
48.03
2026.01
46.72
2026.01
46.6
2026.01
43.92
2026.01
43.55
2026.01
43.11
2026.01
42.85
2026.01
41.73
2026.01
39.75
2026.01
39.16
2026.01
39.03
2026.01
38.98
2026.01
38.35
2026.01
34.72
2026.01
34.25
2026.01
34.25
2026.01
34.05
2026.01
33.45
2026.01
33.43
2026.01
31.63
2026.01
29.22
2026.01
26.53
2026.01
24.71
2026.03
23.2
2026.03
20
2026.01
8.64