| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| Conversational Evaluation Suite AI companionship and Role-play (test) | Echo-N1 | Win Rate95.5 | 13 | 4d ago | |
| RaR Medicine | SibylSense-Adv | Pairwise Win Rate60.6 | 4 | 4d ago | |
| GovReport | SibylSense-Adv | Pairwise Win Rate52.9 | 3 | 4d ago |