Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
User Simulation on HUMANUAL (test)
Loading...
40.58
News Score
HUMANLM
36.16
37.3075
38.455
39.6025
Feb 7, 2026
News Score
Book Score
Opinion Score
Politics Score
Chat Score
Email Score
Average Score
Updated 1mo ago
Evaluation Results
Method
Method
Links
News Score
Book Score
Opinion Score
Politics Score
Chat Score
Email Score
Average Score
HUMANLM
Training Algorithm=GRP...
2026.02
40.58
57.1
46.21
40.68
46.21
43.63
45.7
GRPO-think
Training Algorithm=GRP...
2026.02
38.07
55.48
46.33
40.06
39.7
42.3
43.7
Qwen3-8b-think
Backbone=Qwen3-8b, Sam...
2026.02
36.33
55.35
44.5
39.78
38.17
40.7
42.5
Feedback
Search any
task
Search any
task