Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Dialogue Generation on CMU-DoG
Loading...
49.51
Relevance Win %
CKL
47.0345
48.27225
49.51
50.74775
May 29, 2023
Relevance Win %
Relevance Loss %
Relevance Tie %
Relevance Kappa
Coherence Win %
Coherence Loss %
Coherence Tie %
Coherence Kappa
Informativeness Win %
Informativeness Loss %
Informativeness Tie %
Informativeness Kappa
Overall Preference Win %
Overall Preference Loss %
Overall Preference Tie %
Overall Preference Kappa
Updated 4d ago
Evaluation Results
Method
Method
Links
Relevance Win %
Relevance Loss %
Relevance Tie %
Relevance Kappa
Coherence Win %
Coherence Loss %
Coherence Tie %
Coherence Kappa
Informativeness Win %
Informativeness Loss %
Informativeness Tie %
Informativeness Kappa
Overall Preference Win %
Overall Preference Loss %
Overall Preference Tie %
Overall Preference Kappa
CKL
Comparison Model=DIALKI
2023.05
49.51
20.79
29.7
0.39
44.55
26.73
28.72
0.45
48.51
39.61
11.88
0.42
50.5
38.61
10.89
0.41
Feedback
Search any
task
Search any
task