Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Policy Optimization on Glucose
Loading...
6.3
True Outcome
ORPO*
5.988
6.069
6.15
6.231
Apr 13, 2026
True Outcome
Proxy Performance
Worst-Case Result
Updated 4d ago
Evaluation Results
Method
Method
Links
True Outcome
Proxy Performance
Worst-Case Result
ORPO*
Environment=Glucose
2026.04
6.3
116.36
-8.79
Max-Min
Environment=Glucose
2026.04
6.3
102.66
-1.71
ORPO*
2026.04
6.3
116.36
-8.79
Max-Min
2026.04
6.3
102.66
-1.71
ORPO
Environment=Glucose
2026.04
6
100.48
-27.54
ORPO
2026.04
6
100.48
-27.54
Feedback
Search any
task
Search any
task