Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Target-guided proactive dialogue generation on DuRecDial ID 2.0 (test)
Loading...
3.93
PPL
Our
3.8396
4.4498
5.06
5.6702
May 12, 2026
PPL
Weighted F1
BLEU-1
BLEU-2
Diversity-1
Diversity-2
Knowledge F1
Failure Rate
Updated 21d ago
Evaluation Results
Method
Method
Links
PPL
Weighted F1
BLEU-1
BLEU-2
Diversity-1
Diversity-2
Knowledge F1
Failure Rate
Our
mode=soft
2026.05
3.93
44.87
41.6
31
2.2
8.7
61.17
19.77
T5-Flan
2026.05
3.96
42.72
39
28.8
1.8
7.1
55.26
21.29
Our
mode=hard
2026.05
3.96
44.79
41.6
30.9
2.2
8.8
61.11
19.77
TPDial
repro=⋄
2026.05
5.23
38.29
32
22.2
2.4
8.3
50.97
63.88
TRIPDial
repro=⋄
2026.05
6.19
35.06
31.3
22.3
2.2
7.4
43.35
67.11
Feedback
Search any
task
Search any
task