Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Social Skill on Sotopia
Loading...
47
Primary Metric
DITTO
26.928
32.139
37.35
42.561
May 19, 2026
Primary Metric
Updated 13d ago
Evaluation Results
Method
Method
Links
Primary Metric
DITTO
Backbone=Qwen3-VL-8B-I...
2026.05
47
GRPO
Backbone=Qwen3-VL-8B-I...
2026.05
42.3
Sotopia-RL-7B
2026.05
31.2
GPT-5-nano
2026.05
31
GPT-5.4
2026.05
30
Qwen3-VL-8B-Instruct
Role=Base
2026.05
27.7
Feedback
Search any
task
Search any
task