Werewolf

Benchmarks

Task Name	Dataset Name	SOTA Result
Social Deduction Game Play	Werewolf against human players (test)	Average Votes3.21	7
Multi-agent interaction and social reasoning	Werewolf MultiAgentBench	Task Performance55.75	6
Social Deduction Game Gameplay	Werewolf	Village Team Win Rate29.1	6
Dimensional Emotion Recognition	Werewolf-XL	Arousal39.8	5

Showing 4 of 4 rows