Share your thoughts, 1 month free Claude Pro on us
See more
Feedback
Search any
task
Search any
task
SOTA Dialogue Generation benchmarks and papers with code | Wizwand
Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Tasks
Dialogue Generation
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
CONVAI2
SMoA
BLEU
3.77
48
12d ago
UltraChat
No Def.
ASR Accuracy
98.7
32
1mo ago
PersonaChat (test)
LMEDR
Persona Consistency
2.31
27
3mo ago
DailyDialog
BART joint† (D)
Distinct-1
9.12
26
3mo ago
Douban (test)
Ours
BLEU-1
0.1398
20
3mo ago
ConvAI2 (test)
TeRA
BLEU
3.38
20
1mo ago
Wizard of Wikipedia (WoW) (dev)
KID
F1 Score
16.4
19
3mo ago
Proposed Multi-scenario Dataset 1.0 (test)
SGM
Acc T
86.37
18
3mo ago
Vicuna
XPERT-OLMoE
Rouge-L
15.05
16
21d ago
SelfInst
XPERT-OLMoE
Rouge-L
11.31
16
21d ago
UnNI
XPERT-DeepSeek
Rouge-L
23.2
16
21d ago
S-NI
XPERT-OLMoE
Rouge-L
19.82
16
21d ago
DollyEval
XPERT-OLMoE
ROUGE-L
24.19
16
21d ago
Anthropic-HH (test)
Cal-DPO
Average Preference Score
69.07
16
3mo ago
DailyDialog Multi-reference
DialoGPS
BLEU-1
38.46
16
3mo ago
TG-ReDial
TREA
BLEU-2
5
16
3mo ago
4 dialogue datasets Aggregate (test val)
OPT
Dialogue Avg F1
12.9
15
3mo ago
Cognitive stimulation real 1.0 (test)
GCSD-3b
ROUGE-L
27.63
13
2mo ago
CausalDialogue (test)
Human Written Responses
PPL
1.2
13
3mo ago
CMU-DoG (test)
CKL
BLEU-1
17.74
13
3mo ago
Wizard of Wikipedia (WoW) seen (test)
CKL
BLEU-1
27.29
13
3mo ago
Reddit Conversation Corpus (test)
DialoGPT
PPL
36.03
13
3mo ago
PERSONA-CHAT Original (dev)
LMEDR
Hits@1
89.5
13
3mo ago
PERSUASIVETOM
COSTOM
ToM Score (Judge: Llama-3.3-70B)
80.2
12
1mo ago
NEGOTIATIONTOM N=100
COSTOM
ToM (Llama-3.3-70B)
0.751
12
1mo ago
Showing 25 of 81 rows
25 / page
50 / page
100 / page
1
2
3
4
Search any
task
Search any
task
Privacy Policy
Terms of Service
FAQs
Swarm Docs