Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Proactive Dialogue Evaluation on DuRecDial ID 2.0 (test)
Loading...
3.86
Proactivity
Qwen 14B
2.0088
2.4894
2.97
3.4506
May 12, 2026
Proactivity
Coherence
Appropriateness
Informativeness
Updated 21d ago
Evaluation Results
Method
Method
Links
Proactivity
Coherence
Appropriateness
Informativeness
Qwen 14B
model_size=14B, protoc...
2026.05
3.86
4.91
4.85
3.83
Qwen 32B
model_size=32B, protoc...
2026.05
3.86
4.91
4.85
4.01
LLaMA 8B
model_size=8B, protoco...
2026.05
3.74
4.82
4.73
4.17
Ours 0.3B
model_size=0.3B
2026.05
2.45
4.41
4.3
2.16
Qwen 3B
model_size=3B, protoco...
2026.05
2.21
4.3
4.31
2.05
LLaMA 1B
model_size=1B, protoco...
2026.05
2.12
4.32
4.25
2.07
LLaMA 3B
model_size=3B, protoco...
2026.05
2.08
4.23
4.18
1.98
Feedback
Search any
task
Search any
task