Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

MuTual

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-turn Dialogue ReasoningMuTual (test)
MRR84.6
19
Dialogue ReasoningMUTUAL
AIBC Score0.467
12
Dialogue GenerationMuTual (dev)
MRR54.5
8
Knowledge-Grounded ConversationMutual Non-Biased
Performance53
5
Knowledge-Grounded ConversationMutual Biased
Performance94
5
Multi-turn dialogue reasoningMuTual+
R@181.49
3
Multi-turn dialogue reasoningMuTual
R@188.93
3
Dialogue ReasoningMuTual (dev)
R4@173.4
3
Showing 8 of 8 rows