Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Coding Tasks on Claude Code Evaluation Set
Loading...
26
Number of Samples
Claude Code (Original)
21.84
22.92
24
25.08
May 5, 2026
Number of Samples
Mean Delta
Wins
Ties
Losses
T-Statistic
P-Value
Effect Size (d)
Updated 28d ago
Evaluation Results
Method
Method
Links
Number of Samples
Mean Delta
Wins
Ties
Losses
T-Statistic
P-Value
Effect Size (d)
Claude Code (Original)
Comparison Baseline=Va...
2026.05
26
0.002
3
21
2
0.031
0.9756
0.006
Claude Code (Compiled)
Comparison Baseline=Va...
2026.05
23
0.265
7
16
0
2.837
0.0096
0.592
Claude Code (Compiled)
Comparison Baseline=Or...
2026.05
22
0.274
7
15
0
2.82
0.0103
0.601
Feedback
Search any
task
Search any
task