Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Constrained Markov Decision Process on Extended Chain CMDP (last 1,000 episodes)
Loading...
4.768
Return
Unconstrained
2.86792
3.36121
3.8545
4.34779
Apr 5, 2026
Return
pi(hack) Score
RHSI (Raw Value)
Updated 12d ago
Evaluation Results
Method
Method
Links
Return
pi(hack) Score
RHSI (Raw Value)
Unconstrained
Method=Unconstrained
2026.04
4.768
93.2
2.45
MC-CPO
Method=MC-CPO
2026.04
3.538
30.4
2.179
Post-hoc
Method=Post-hoc
2026.04
2.941
0
2.406
Feedback
Search any
task
Search any
task