Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
User Simulation on UserLLM
Loading...
93
Primary Metric Score
DITTO
29.976
46.338
62.7
79.062
May 19, 2026
Primary Metric Score
Updated 13d ago
Evaluation Results
Method
Method
Links
Primary Metric Score
DITTO
Backbone=Qwen3-VL-8B-I...
2026.05
93
GRPO
Backbone=Qwen3-VL-8B-I...
2026.05
86.3
GPT-5.4
2026.05
57.5
HER-32B
2026.05
53.7
Qwen3-VL-8B-Instruct
Role=Base
2026.05
46.9
GPT-5-nano
2026.05
32.4
Feedback
Search any
task
Search any
task