Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Maximum-likelihood infilling on Cosmopedia 30 prompts
Loading...
184
Mean Reward
GRPO
182.8248
183.1299
183.435
183.7401
May 26, 2026
Mean Reward
Best-of-8 Score
Variance
Entropy
Prefix Cross-Entropy (CE)
Text Cross-Entropy (CE)
Updated 6d ago
Evaluation Results
Method
Method
Links
Mean Reward
Best-of-8 Score
Variance
Entropy
Prefix Cross-Entropy (CE)
Text Cross-Entropy (CE)
GRPO
Training step=8k
2026.05
184
181.53
13.55
0.12
19.89
164.11
Frost
Training step=8k
2026.05
182.87
172.77
81.87
0.58
21.18
161.69
Feedback
Search any
task
Search any
task