Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Role-playing Dialogue Evaluation on RoleChat gold-standard evaluation set

96.6Logical Coherence

GPT-4.1

7.78430.84253.976.958Apr 15, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.04
96.692.128.419.548.256.96100
2026.04
94.890.285.175.98486100
2026.04
86.572.975.851.662.269.8100
2026.04
66.246.521.214.635.436.7894.3
2026.04
65.242.361.252.444.253.0694.2
2026.04
63.843.351.622.135.543.2675.8
2026.04
62.542.118.512.332.133.591.1
2026.04
40.925.142.111.134.130.6610.2
2026.04
35.229.334.216.232.329.480
2026.04
11.223.243.212.122.122.366.2