Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Alignment Evaluation on Open-ended questions
Loading...
68.9
Win Rate
Single-Agent
37.18
45.415
53.65
61.885
Mar 11, 2026
Win Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
Single-Agent
Comparison=vs. Base, D...
2026.03
68.9
Single-Agent
Comparison=vs. Base, D...
2026.03
68.7
Multi-Agent
Comparison=vs. Base, D...
2026.03
63.4
Multi-Agent
Comparison=vs. Base, D...
2026.03
51.8
Multi-Agent
Comparison=vs. Single-...
2026.03
40.4
Multi-Agent
Comparison=vs. Single-...
2026.03
38.4
Feedback
Search any
task
Search any
task