Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-agent interaction and social reasoning on Werewolf MultiAgentBench
Loading...
55.75
Task Performance
ETI
24.0404
32.2727
40.505
48.7373
Apr 21, 2026
Task Performance
Coordination
Updated 1mo ago
Evaluation Results
Method
Method
Links
Task Performance
Coordination
ETI
Agent=QWEN, Trait Sour...
2026.04
55.75
65.2
ETI
Agent=QWEN, Trait Sour...
2026.04
49.97
59.52
QWEN
Trait Source=none
2026.04
43.28
60.2
ETI
Agent=GPT, Trait Sourc...
2026.04
36.46
57.56
ETI
Agent=GPT, Trait Sourc...
2026.04
29.54
55.56
GPT
Trait Source=none
2026.04
25.26
54.32
Feedback
Search any
task
Search any
task