Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Proactive Dialogue Evaluation on DuRecDial OOD 2.0 (test)
Loading...
4.07
Proactivity
Qwen 32B
2.2396
2.7148
3.19
3.6652
May 12, 2026
Proactivity
Coherence
Appropriateness
Informativeness
Updated 21d ago
Evaluation Results
Method
Method
Links
Proactivity
Coherence
Appropriateness
Informativeness
Qwen 32B
model_size=32B, protoc...
2026.05
4.07
4.91
4.86
4.13
Qwen 14B
model_size=14B, protoc...
2026.05
4.02
4.92
4.83
3.83
LLaMA 8B
model_size=8B, protoco...
2026.05
3.78
4.82
4.69
4.03
Ours 0.3B
model_size=0.3B
2026.05
2.71
4.33
3.98
1.98
Qwen 3B
model_size=3B, protoco...
2026.05
2.35
4.29
4.09
1.88
LLaMA 3B
model_size=3B, protoco...
2026.05
2.32
4.29
4.07
1.89
LLaMA 1B
model_size=1B, protoco...
2026.05
2.31
4.33
3.97
1.89
Feedback
Search any
task
Search any
task