Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Agent Collaboration Role Overstepping on SWE hard subset (dev)
Loading...
0
Overstepping Rate (<INFO>)
ChatDev
-1.664
9.568
20.8
32.032
Apr 3, 2026
Overstepping Rate (<INFO>)
Overstepping Rate (INFO)
Delta (%) (<INFO>)
Delta (%) (INFO)
Updated 13d ago
Evaluation Results
Method
Method
Links
Overstepping Rate (<INFO>)
Overstepping Rate (INFO)
Delta (%) (<INFO>)
Delta (%) (INFO)
ChatDev
CEO Configuration=FT,...
2026.04
0
0
-41.6
-42.8
ChatDev
CEO Configuration=FT,...
2026.04
0.5
0.5
-41.1
-42.3
ChatDev
CEO Configuration=Base...
2026.04
14.4
15.6
-27.2
-27.2
ChatDev
CEO Configuration=Base...
2026.04
41.6
42.8
-
-
Feedback
Search any
task
Search any
task