WoW

Benchmarks

Task Name	Dataset Name	SOTA Result
Question Answering	WoW	Average Performance Score84.3	20
Knowledge-grounded Dialogue	WoW	F1 Score17.35	15
Retrieval-Augmented Generation	WoW	LLM Score88.87	11
Transition Prediction	WoW	IoU41.32	10
Dialogue Generation	WoW (test)	Token-level F117.4	9
Dialogue	WoW	F1 Score14.77	8
Automatic Speech Recognition	WoW Out-of-domain (test)	WER12.7	6
Knowledge-grounded Dialog Generation	WoW (Seen)	Appropriateness Score4.5	6
Natural Language Generation	WoW	Mean Relevance4.68	5
Enterprise Workflow Automation	WoW	Perfect Match30.5	4
Slang detection	WoW community (evaluation)	F1 Score6.4	2
Knowledge-grounded Dialogue	WoW (test)	Dialogue Turns163	2

Showing 12 of 12 rows