Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Werewolf

Benchmarks

Task NameDataset NameSOTA ResultTrend
Social Deduction Game PlayWerewolf against human players (test)
Average Votes3.21
7
Multi-agent interaction and social reasoningWerewolf MultiAgentBench
Task Performance55.75
6
Social Deduction Game GameplayWerewolf
Village Team Win Rate29.1
6
Dimensional Emotion RecognitionWerewolf-XL
Arousal39.8
5
Showing 4 of 4 rows