Share your thoughts, 1 month free Claude Pro on usSee more

Safety-constrained Reinforcement Learning on Safety-Gym SafetyPointGoal1 (evaluation)

11.37Average Reward

PPO

Updated 2mo ago

Evaluation Results

Method	Links
PPO 2026.05		11.37	109.87	1	4.395	0.103
PPO-Lag 2026.05		8.14	90.27	0.866	3.611	0.09
RiskGated 2026.05		5.39	36.93	0.474	1.477	0.146
TRPO-Lag 2026.05		2.99	195.33	0.318	7.813	0.015
RCPO 2026.05		2.99	195.33	0.318	7.813	0.015
CUP 2026.05		2.3	41.67	0.245	1.667	0.055
FOCOPS 2026.05		2.26	309.33	0.241	12.373	0.007
CPPO-PID 2026.05		0.68	110.67	0.072	4.427	0.006
CPO 2026.05		-0.12	25.67	-0.013	1.027	-0.005
POMDP 2026.05		-0.71	101.5	-0.063	4.06	-0.007
PCPO 2026.05		-0.73	5.67	-0.077	0.227	-0.129