Dialogue

Benchmarks

Dataset Name	SOTA Method	Metric
MT-Bench	GPT-4o	MT-Bench Score9.3	67	24d ago
MT-Bench (test)	LoRA	GPT-4 Score8.36	46	4mo ago
IFEval	FuseChat3.0	IFEval80.2	34	2mo ago
AlpacaEval 2	FuseChat3.0	AlpacaEval2 Score64.2	34	2mo ago
4 dialogue tasks (Skill Talk, Empathetic Dialogues, Wizard of Internet, Wizard of Wikipedia) (test)		F1 Score13.7	24	4mo ago
Dialogue	HBAT	PandaLM77.79	18	4mo ago
Anthropic-HH (distillation set)		Response Word Count73.53	16	4mo ago
MT-Bench	DFlash+DDTree	Speedup4.18	12	3mo ago
HH (Anthropic Helpful and Harmless)	α-URM	Win Rate (0% Flip)82.5	10	18d ago
DailyDialog	GPT2-tree	R-114.99	10	4mo ago
WoW	MindRef	F1 Score14.77	8	4mo ago
USR (N = 198)	TCVA	Spearman's Rho0.173	7	3mo ago
GROWOVER-DIALOGUE (NEW)	RiLM	BLEU (Month 9)5.36	6	4mo ago
GROWOVER-DIALOGUE (UNCHANGED)	RiLM	BLEU (Month 9)4.68	6	4mo ago
Dialogue (test)		Fluency8.84	5	4mo ago
WildChat	BACO best	Lexical Coverage47.3	4	1mo ago
MT-Bench	PARSE + EAGLE3	TPS (tok/s)194	4	2mo ago
Dialogue dataset	M-RAG	BLEU-124.52	4	4mo ago
WildSpeech-Bench		Score76.3	3	3mo ago
SpeechRole		Score124.2	3	3mo ago
URO-Bench-pro		Understanding Score69.1	3	3mo ago
VoiceBench	Qwen3.5-Omni-Plus	Score93.1	3	3mo ago
TruthfulQA		Accuracy92.2	2	4mo ago
MT-Bench (full set)		Accuracy (%)9.3	2	4mo ago

Showing 24 of 24 rows