Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Extended Chain CMDP

Benchmarks

Task NameDataset NameSOTA ResultTrend
Constrained Markov Decision ProcessExtended Chain CMDP (last 1,000 episodes)
Return4.768
3
Safety-Constrained Reinforcement LearningExtended Chain CMDP (last 1,000 episodes)
Jc2 Constraint Metric0.069
3
Showing 2 of 2 rows