Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

WoW

Benchmarks

Task NameDataset NameSOTA ResultTrend
Question AnsweringWoW
Average Performance Score84.3
20
Knowledge-grounded DialogueWoW
F1 Score17.35
15
Retrieval-Augmented GenerationWoW
LLM Score88.87
11
DialogueWoW
F1 Score14.77
8
Automatic Speech RecognitionWoW Out-of-domain (test)
WER12.7
6
Knowledge-grounded Dialog GenerationWoW (Seen)
Appropriateness Score4.5
6
Natural Language GenerationWoW
Mean Relevance4.68
5
Enterprise Workflow AutomationWoW
Perfect Match30.5
4
Knowledge-grounded DialogueWoW (test)
Dialogue Turns163
2
Showing 9 of 9 rows