Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Task-oriented Dialogue on MultiWOZ 2.1 (SR, HR)
Loading...
51.24
Success Rate (SR)
VLK-RL
27.5904
33.7302
39.87
46.0098
Apr 25, 2026
Success Rate (SR)
Hit Rate (HR)
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Hit Rate (HR)
VLK-RL
backbone=Qwen-14B
2026.04
51.24
3.18
VLK-RL
backbone=Qwen-7B
2026.04
49.36
3.07
VLK-RL
backbone=GPT-4o-mini
2026.04
47.13
3.04
GDP-Zero
2026.04
41.2
2.61
CAPID
2026.04
40.56
2.63
TransferTOD
2026.04
39.82
2.58
GALAXY
2026.04
38.75
2.46
ACGOS
2026.04
34.1
2.35
PPO
2026.04
28.5
2.21
Feedback
Search any
task
Search any
task