Share your thoughts, 1 month free Claude Pro on usSee more

Safe Reinforcement Learning on SafetyPointPush2 v0

0.6Mean Reward

PPO-LAG

Updated 12d ago

Evaluation Results

Method	Links
PPO-LAG 2026.04		0.6	1.58	31.34	58.17
PPO 2026.04		0.41	3.11	59.86	120.18
TRPO 2026.04		0.2	2.5	106.88	216.19
PPO-FAB 2026.04		-0.07	0.52	7.7	37.51
TRPO-LAG 2026.04		-0.77	5.51	28.22	67.21