Response Selection

Benchmarks

Dataset Name	SOTA Method	Metric
Douban Conversation Corpus (test)	BERT_TL	MAP0.675	94	4mo ago
E-commerce (test)	BERT_TL	Recall@1 (R10)0.927	81	4mo ago
Ubuntu (test)	BERT-UMS+FGC	Recall@1 (Top 10)0.886	58	4mo ago
DSTC7 Track 1 (test)	Cross-encoder	Recall@1 (Top 100)91.1	27	4mo ago
ConvAI2 (dev)	Cross-encoder	R@1/2090.3	25	4mo ago
Ubuntu v2 (test)	Cross-encoder	MRR91.9	20	4mo ago
MWOZ 2.1	FutureTOD	Accuracy (1/100)68.5	17	4mo ago
P-Soups Expertise	Qwen3-32Bthinking	Accuracy83.66	16	4mo ago
P-Soups Style	Qwen3-32Bthinking	Accuracy0.88	16	4mo ago
P-Soups Informativeness	ALIGNXPLORE+	Accuracy78.07	16	4mo ago
PersonaMem	TALLRec	Accuracy64.36	16	4mo ago
AlignX	ALIGNXPLORE+	Accuracy75.03	16	4mo ago
ConvAI2 (test)	Cross-encoder	R@2087.9	16	4mo ago
PERSONA-CHAT Revised (test)	P5	R@182.79	11	4mo ago
PERSONA-CHAT Original Persona (test)	P5	R@187.45	11	4mo ago
Reddit SC (test)		Perplexity@Top-1181.8	11	4mo ago
Reddit MC (test)	CFC-QS	Perplexity@1194.8	11	4mo ago
Ubuntu IRC Len-15 (test)	MPC-BERT	R@289.7	10	4mo ago
Ubuntu IRC Len-10 (test)	MPC-BERT	R@289.14	10	4mo ago
Ubuntu IRC Len-5 (test)	MPC-BERT	Recall@287.63	10	4mo ago
Ubuntu IRC (test)	MPC-BERT	R2@194.9	8	4mo ago
Reddit (test)	ConveRT	R@1 (R100)71.8	7	4mo ago
AmazonQA (test)	ConveRT	R@1 (K=100)84.3	6	4mo ago
Focus 1.0 (val)	P5	R@197.85	3	4mo ago
PersonaChat (test)	Uni-Encoder	R@1 (R20 Context)86.9	3	4mo ago

Showing 25 of 29 rows