Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Open-ended reasoning on WildBench
Loading...
57.05
Creative Score
PPO w/ Structure Reward
50.7684
52.3992
54.03
55.6608
Mar 30, 2026
Creative Score
Planning Score
Math Score
Info Score
Code Score
WB Score
Updated 2mo ago
Evaluation Results
Method
Method
Links
Creative Score
Planning Score
Math Score
Info Score
Code Score
WB Score
PPO w/ Structure Reward
RL framework=PPO
2026.03
57.05
45.87
27.7
53.47
30.05
40.26
GRPO w/ Structure Reward
RL framework=GRPO
2026.03
55.34
43.56
26.83
51.98
29.91
39.01
DPO
2026.03
51.16
37.82
17.06
48.37
14.34
30.34
Base
2026.03
51.01
36.23
16.35
48.71
14.72
29.91
GRPO w/ Entropy Minimization (EMPO)
RL framework=GRPO
2026.03
51.01
36.11
17.62
45.69
12.92
29.2
Feedback
Search any
task
Search any
task