Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Helpfulness Evaluation on Pause-and-think B
Loading...
80.6
Conciseness
pause-and-think (Ours)
62.712
67.356
72
76.644
May 30, 2026
Conciseness
Politeness
Aggregated Helpfulness
Updated 1d ago
Evaluation Results
Method
Method
Links
Conciseness
Politeness
Aggregated Helpfulness
pause-and-think (Ours)
parameters=4B, trainin...
2026.05
80.6
71.4
76
Qwen3VL-235B
parameters=235B
2026.05
65.8
75.7
70.7
GPT 5.2
2026.05
63.4
77.7
70.5
Feedback
Search any
task
Search any
task