Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Dialogue Generation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Dialogue Generation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
PersonaChat (test)
LMEDR
Persona Consistency
2.31
27
1mo ago
DailyDialog
BART joint† (D)
Distinct-1
9.12
26
1mo ago
CONVAI2
SMoA(r=32,n=2)
BLEU
3.68
24
1mo ago
Douban (test)
Ours
BLEU-1
0.1398
20
1mo ago
ConvAI2 (test)
TeRA
BLEU
3.38
20
3d ago
Wizard of Wikipedia (WoW) (dev)
KID
F1 Score
16.4
19
1mo ago
Proposed Multi-scenario Dataset 1.0 (test)
SGM
Acc T
86.37
18
1mo ago
Anthropic-HH (test)
Cal-DPO
Average Preference Score
69.07
16
1mo ago
DailyDialog Multi-reference
DialoGPS
BLEU-1
38.46
16
1mo ago
TG-ReDial
TREA
BLEU-2
5
16
1mo ago
4 dialogue datasets Aggregate (test val)
OPT
Dialogue Avg F1
12.9
15
1mo ago
Cognitive stimulation real 1.0 (test)
GCSD-3b
ROUGE-L
27.63
13
1mo ago
CausalDialogue (test)
Human Written Responses
PPL
1.2
13
1mo ago
CMU-DoG (test)
CKL
BLEU-1
17.74
13
1mo ago
Wizard of Wikipedia (WoW) seen (test)
CKL
BLEU-1
27.29
13
1mo ago
Reddit Conversation Corpus (test)
DialoGPT
PPL
36.03
13
1mo ago
PERSONA-CHAT Original (dev)
LMEDR
Hits@1
89.5
13
1mo ago
PERSUASIVETOM
COSTOM
ToM Score (Judge: Llama-3.3-70B)
80.2
12
4d ago
NEGOTIATIONTOM N=100
COSTOM
ToM (Llama-3.3-70B)
0.751
12
4d ago
Cognitive stimulation simulated dataset 1.0 (test)
GCSD-3b
ROUGE-L
26.92
12
1mo ago
Syn. Persona
LongGuide
ROUGE-L
22.98
12
1mo ago
PERSONA-CHAT Revised (dev)
LMEDR
Hits@1
85
11
1mo ago
E2E
BOMF
BLEU
64.81
10
1mo ago
Commonsense Dialogue Dataset (test)
SaBART
Dist-1
0.0616
10
1mo ago
Dialogue-level evaluation Ethical and Affective Alignment
ETHICMIND_GPT-4o
Respectful Tone
8.1739
9
5d ago
Showing 25 of 67 rows
25 / page
50 / page
100 / page
1
2
3
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs