Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
User Simulation Quality Assessment on Turn-level Human Evaluation Set adversarial (test)
Loading...
168
Win Rate
UserLM-R1
62.96
90.23
117.5
144.77
Jan 14, 2026
Win Rate
Tie Rate
Loss Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
Tie Rate
Loss Rate
UserLM-R1
Opponent=DeepSeek-R1
2026.01
168
20
42
UserLM-R1
Opponent=Gemini-2.5-Flash
2026.01
142
40
38
UserLM-R1
Opponent=UserLM-R1 (w/...
2026.01
67
111
42
Feedback
Search any
task
Search any
task