ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

About

Existing end-to-end modeling methods for modular task-oriented dialog systems are typically tailored to specific datasets, making it challenging to adapt to new dialog scenarios. In this work, we propose ESAinsTOD, a unified End-to-end Schema-Aware Instruction-tuning framework for general Task-Oriented Dialog modeling. This framework introduces a structured methodology to go beyond simply fine-tuning Large Language Models (LLMs), enabling flexible adaptation to various dialogue task flows and schemas. Specifically, we leverage full-parameter fine-tuning of LLMs and introduce two alignment mechanisms to make the resulting system both instruction-aware and schema-aware: (i) instruction alignment, which ensures that the system faithfully follows task instructions to complete various task flows from heterogeneous TOD datasets; and (ii) schema alignment, which encourages the system to make predictions adhering to the specified schema. In addition, we employ session-level end-to-end modeling, which allows the system to access the results of previously executed task flows within the dialogue history, to bridge the gap between the instruction-tuning paradigm and the real-world application of TOD systems. Empirical results show that while a fine-tuned LLM serves as a strong baseline, our structured approach provides significant additional benefits. In particular, our findings indicate that: (i) ESAinsTOD outperforms state-of-the-art models by a significant margin on end-to-end task-oriented dialog modeling benchmarks: CamRest676, In-Car and MultiWOZ; (ii) more importantly, it exhibits superior generalization capabilities across various low-resource settings, with the proposed alignment mechanisms significantly enhancing zero-shot performance; and (iii) our instruction-tuning paradigm substantially improves the model's robustness against data noise and cascading errors.

Dechuan Teng, Chunlin Lu, Libo Qin, Wanxiang Che• 2026

Related benchmarks

Task	Dataset	Result
Intent Classification	Banking77	Accuracy92.89	260
Dialogue State Tracking	MultiWOZ 2.1 (test)	Joint Goal Accuracy60.76	105
End-to-end task-oriented dialogue	MultiWOZ 2.1 (test)	BLEU Score21.92	57
Dialogue State Tracking	MultiWOZ 2.0 (test)	Joint Goal Accuracy57.23	29
Intent Classification	HWU64	Accuracy92.75	17
Intent Classification	CLINC150	Accuracy97.31	17
End-to-End Task-Oriented Dialog	In-Car	Match Rate90.58	12
End-to-End Dialog Modeling	MultiWOZ 2.0	Inform Score94.3	11
End-to-End Dialog Modeling	CamRest676	Match Score98.5	6
Intent Detection and Slot Filling	SNIPS	Intent Accuracy99.43	4

Showing 10 of 10 rows

Other info

Follow for update

@wizwand_team Discord