Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DailyDialog

Benchmarks

Task NameDataset NameSOTA ResultTrend
Emotion DetectionDailyDialog (test)
Micro-F10.6034
53
Dialogue Emotion DetectionDailyDialog
Micro F1 (- neutral)0.6167
27
Dialogue GenerationDailyDialog
Distinct-19.12
26
Knowledge RetrievalDailyDialog
BERTScore Precision (avg)84.29
16
Dialogue Response SelectionDailyDialog reformatted multiple-choice (test)
Accuracy90.35
16
Dialogue GenerationDailyDialog Multi-reference
BLEU-138.46
16
Response GenerationDailyDialog (test)
BLEU-235.4
16
Emotion Recognition in ConversationDailyDialog (test)
F1 Score0.6312
16
Emotion Recognition in ConversationsDailyDialog
Macro F159.33
15
Attribute-Controlled Dialogue GenerationDailyDialog-CG (test)
Emotion Accuracy (E-ACC)70.66
12
Dialogue EvaluationDailyDialog (eval)
Spearman Correlation0.579
10
DialogueDailyDialog
R-114.99
10
Knowledge Pair-wise DiversityDailyDialog (test)
Precision89.58
9
Human Logic AlignmentDailyDialog
Human Logic Alignment (T=0.5)80.97
9
Red TeamingDailyDialog against DialoGPT-large
RSR40
8
Red TeamingDailyDialog against BB-3B
RSR40.2
8
Dialogue Policy EvaluationDailydialog (test)
USR MLM81.1
8
DialogAct label controlDailyDialog multi-reference (test)
Accuracy80.25
7
Cause EntailmentDailyDialog (DD) (Fold 1)
F1 (Positive)69.2
7
Dialogue GenerationDailydialog
Attribute Relevancy34.7
6
Response GenerationDailyDialog
Pairwise Diversity78.5
6
Text GenerationDailyDialog (test)
BERTscore0.8404
6
Dialogue CoherenceDailyDialog
QuantiDCE3.24
6
Dialogue Emotion RecognitionDailyDialog
Micro F1 (Neutral)0.5629
6
Dialogue EvaluationDailyDialog
USR RET0.998
4
Showing 25 of 32 rows