MirrorBench

Benchmarks

Task Name	Dataset Name	SOTA Result	Trend
Conversation Simulation	MirrorBench	Primary Metric68.3		9
User Simulation	MirrorBench	Realism Score (LLM-judge)0.713		6

Showing 2 of 2 rows