Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Turn User Simulation on PEARL (test)
Loading...
95.31
Success Rate (SR)
UserSim-Qwen 8B
33.7836
49.7568
65.73
81.7032
Mar 19, 2026
Success Rate (SR)
Expected Turns (ET)
Failure Rate (FR)
Updated 29d ago
Evaluation Results
Method
Method
Links
Success Rate (SR)
Expected Turns (ET)
Failure Rate (FR)
UserSim-Qwen 8B
Mode=Fine-tuned, Param...
2026.03
95.31
0.0232
2.37
UserSim-Llama 8B
Mode=Fine-tuned, Param...
2026.03
92.65
0.0498
2.37
Qwen3 32B
Mode=Zero-shot, Parame...
2026.03
77.72
0.0097
21.31
Llama3.1 70B
Mode=Zero-shot, Parame...
2026.03
36.15
0.0033
63.51
Feedback
Search any
task
Search any
task