Share your thoughts, 1 month free Claude Pro on usSee more

Constrained Reinforcement Learning on AntCircle

198.6Episodic Reward

e-COP

Updated 5mo ago

Evaluation Results

Method	Links
e-COP 2025.12		198.6	9.8
P3O 2025.12		182.6	9.8
PCPO 2025.12		168.3	9.5
FOCOPS 2025.12		161.9	9.9
APPO 2025.12		155.5	10
IPO 2025.12		149.3	9.5
PPO-L 2025.12		134.4	9.6
CPO 2025.12		127.1	10.1