Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Single-Turn User Simulation on PEARL (test)
Loading...
0.9373
BertScore
UserSim-Qwen 8B
0.884988
0.898569
0.91215
0.925731
Mar 19, 2026
BertScore
Dist-4
Avg. Word Count
Updated 29d ago
Evaluation Results
Method
Method
Links
BertScore
Dist-4
Avg. Word Count
UserSim-Qwen 8B
Mode=Fine-tuned, Param...
2026.03
0.9373
0.3093
30.7637
UserSim-Llama 8B
Mode=Fine-tuned, Param...
2026.03
0.9364
0.3023
31.2219
Qwen3 32B
Mode=Zero-shot, Parame...
2026.03
0.8921
0.3409
31.6797
Llama3.1 70B
Mode=Zero-shot, Parame...
2026.03
0.887
0.2786
43.4067
Feedback
Search any
task
Search any
task