Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Reasoning on BigBenchHard

100Accuracy (BigBenchHard)

Ll4-Sct

29.07247.48665.984.314Feb 2, 2026Feb 20, 2026Mar 11, 2026Mar 30, 2026Apr 18, 2026May 7, 2026May 26, 2026
Updated 7d ago

Evaluation Results

MethodLinks
2026.05
100
2026.05
99.19
2026.02
82.4
2026.04
78.2
2026.04
74.9
2026.04
74.2
2026.02
73
71.91
2026.02
71.38
2026.02
70.1
2026.04
69.9
2026.04
68.2
2026.04
66.7
2026.04
65.6
2026.04
61.4
2026.04
56.5
2026.04
52.9
2026.04
46.8
2026.04
45.4
2026.05
37.8
2026.05
37.7
2026.05
31.8