| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Synthetic Exposure Overall | Llama-3.2-3B-Instruct +SFT+DPO | Accuracy (%)55 | 19 | 1mo ago | |
| MIND Real Exposure | Accuracy34.8 | 19 | 1mo ago | ||
| Instruments | CoARS | F1 Score38.12 | 13 | 1mo ago | |
| MovieLens | CoARS | F1 Score29.74 | 13 | 1mo ago | |
| LastFM | CoARS | F1 Score31.45 | 13 | 1mo ago | |
| Adversarial User Simulation Dataset Turn-level (test) | Robotics Score95 | 11 | 3mo ago | ||
| User Simulation Dataset Session-level (test) | UserLM-R1-32B | Role Score95.21 | 11 | 3mo ago | |
| WildChat (In-Distribution) | Turn Count8.62 | 8 | 21d ago | ||
| UserLLM | DITTO | Primary Metric Score93 | 6 | 13d ago | |
| MirrorBench | DITTO | Realism Score (LLM-judge)0.713 | 6 | 13d ago | |
| ConvApparel Efficient Matchmaker Ultra-terse Fast V2 (held-out agent) | Static, Agnostic | Avg. Words per Turn25.4 | 5 | 21d ago | |
| ConvApparel Domain Expert Academic Verbose V2 (held-out agent) | Dynamic, Aware | Average Words per Turn12.1 | 5 | 21d ago | |
| HUMANUAL (test) | HUMANLM | News Score40.58 | 3 | 3mo ago |