Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
User Simulation Quality Assessment on Session-level Human Evaluation Set adversarial (test)
Loading...
86
Win Rate
UserLM-R1
42.32
53.66
65
76.34
Jan 14, 2026
Win Rate
Tie Rate
Loss Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Win Rate
Tie Rate
Loss Rate
UserLM-R1
Opponent=Gemini-2.5-Flash
2026.01
86
22
12
UserLM-R1
Opponent=UserLM-R1 (w/...
2026.01
55
32
33
UserLM-R1
Opponent=DeepSeek-R1
2026.01
44
28
48
Feedback
Search any
task
Search any
task