Share your thoughts, 1 month free Claude Pro on usSee more

User Simulation on MirrorBench

0.713Realism Score (LLM-judge)

DITTO

Updated 2mo ago

Evaluation Results

Method	Links
DITTO 2026.05		0.713
GRPO 2026.05		0.683
Qwen3-VL-8B-Instruct 2026.05		0.547
GPT-5.4 2026.05		0.536
HumanLM-8B 2026.05		0.481
GPT-5-nano 2026.05		0.358