Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Agent Collaboration Role Overstepping on SWE total full set (dev)
Loading...
0.2
Overstepping Rate (<INFO>)
ChatDev
-1.528
10.136
21.8
33.464
Apr 3, 2026
Overstepping Rate (<INFO>)
Overstepping Rate (INFO)
Delta (%) (<INFO>)
Delta (%) (INFO)
Updated 13d ago
Evaluation Results
Method
Method
Links
Overstepping Rate (<INFO>)
Overstepping Rate (INFO)
Delta (%) (<INFO>)
Delta (%) (INFO)
ChatDev
CEO Configuration=FT,...
2026.04
0.2
0.2
-43.2
-44.4
ChatDev
CEO Configuration=FT,...
2026.04
0.65
0.65
-42.75
-43.95
ChatDev
CEO Configuration=Base...
2026.04
15.6
16.2
-27.8
-28.4
ChatDev
CEO Configuration=Base...
2026.04
43.4
44.6
-
-
Feedback
Search any
task
Search any
task