Share your thoughts, 1 month free Claude Pro on usSee more

Constrained Markov Decision Process on Extended Chain CMDP (last 1,000 episodes)

4.768Return

Unconstrained

Updated 3mo ago

Evaluation Results

Method	Links
Unconstrained 2026.04		4.768	93.2	2.45
MC-CPO 2026.04		3.538	30.4	2.179
Post-hoc 2026.04		2.941	0	2.406