Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Role-playing Evaluation on RoleChat (val)
Loading...
0.21
Overall MSE
RoleJudge
0.1008
0.8379
1.575
2.3121
Apr 15, 2026
Overall MSE
Pearson Correlation E-A
Pearson Correlation S-A
Updated 1mo ago
Evaluation Results
Method
Method
Links
Overall MSE
Pearson Correlation E-A
Pearson Correlation S-A
RoleJudge
2026.04
0.21
0.81
0.62
Gemini3 Pro
2026.04
0.68
0.68
0.59
GPT-4o-audio
2026.04
1.42
0.51
0.46
Qwen3-Omni
2026.04
1.86
0.44
0.38
Qwen2-Audio
2026.04
2.94
0.35
0.26
Feedback
Search any
task
Search any
task