Joint Turn and Dialogue level User Satisfaction Estimation on Multi-Domain Conversations
About
Dialogue level quality estimation is vital for optimizing data driven dialogue management. Current automated methods to estimate turn and dialogue level user satisfaction employ hand-crafted features and rely on complex annotation schemes, which reduce the generalizability of the trained models. We propose a novel user satisfaction estimation approach which minimizes an adaptive multi-task loss function in order to jointly predict turn-level Response Quality labels provided by experts and explicit dialogue-level ratings provided by end users. The proposed BiLSTM based deep neural net model automatically weighs each turn's contribution towards the estimated dialogue-level rating, implicitly encodes temporal dependencies, and removes the need to hand-craft features. On dialogues sampled from 28 Alexa domains, two dialogue systems and three user groups, the joint dialogue-level satisfaction estimation model achieved up to an absolute 27% (0.43->0.70) and 7% (0.63->0.70) improvement in linear correlation performance over baseline deep neural net and benchmark Gradient boosting regression models, respectively.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| User Satisfaction Estimation | MWOZ | Accuracy47.6 | 14 | |
| User Satisfaction Estimation | SGD | Accuracy57.4 | 14 | |
| User Satisfaction Estimation | JDDC | Accuracy58.3 | 14 | |
| User Satisfaction Estimation | Bing Copilot 0.8% training size (test) | Precision57.7 | 8 | |
| User Satisfaction Estimation | MWOZ 5% training size (test) | Precision33.3 | 8 | |
| User Satisfaction Estimation | SGD 5% training size (test) | Precision49.6 | 8 | |
| User Satisfaction Estimation | ReDial 5% training size (test) | Precision40.6 | 8 |