Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Execution on Kimi CLI 74 paired tasks
Loading...
22
Test Statistic (t/W)
Wilcoxon signed-rank
-0.53992
5.31179
11.1635
17.01521
May 5, 2026
Test Statistic (t/W)
P-Value
Significance Level
Updated 28d ago
Evaluation Results
Method
Method
Links
Test Statistic (t/W)
P-Value
Significance Level
Wilcoxon signed-rank
Test Type=Wilcoxon sig...
2026.05
22
0.005
0.01
Non-tie only (n = 17)
Test Type=Non-tie only...
2026.05
3.449
0.0033
0.01
Paired t-test
Test Type=Paired t-test
2026.05
2.815
0.0063
0.01
Cohen’s d (paired)
Test Type=Effect Size
2026.05
0.327
-
-
Feedback
Search any
task
Search any
task