Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

User Simulation on User Simulation Dataset Session-level (test)

95.21Role Score

UserLM-R1-32B

43.594856.994970.39583.7951Jan 14, 2026
Updated 1mo ago

Evaluation Results

MethodLinks
2026.01
95.2146.0857.4673.49
94.2958.7546.9273.56
2026.01
92.2532.7145.9665.79
9254.8740.7969.92
2026.01
89.2937.7141.4664.44
2026.01
87.6738.9237.9663.06
2026.01
86.0956.3444.768.31
2026.01
85.5729.8534.2158.8
2026.01
85.2430.2537.8859.65
84.5847.5439.7164.1
45.5813.8714.529.88