Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Chatbot

Benchmarks

Task NameDataset NameSOTA ResultTrend
General Utility EvaluationChatbot
Agree Score80
14
Dialogue ManagementChatbot (unseen states)
Reward0.85
4
Open-domain dialogueChatbot
Accuracy77.8
2
Showing 3 of 3 rows