Share your thoughts, 1 month free Claude Pro on usSee more

Constrained Reinforcement Learning on AntReach

102.3Episodic Reward

CPO

Updated 5mo ago

Evaluation Results

Method	Links
CPO 2025.12		102.3	35.1
P3O 2025.12		73.6	24.8
e-COP 2025.12		70.8	24.2
APPO 2025.12		61.5	24.5
PPO-L 2025.12		54.2	21.9
FOCOPS 2025.12		48.3	25.1
IPO 2025.12		45.2	24.9
PCPO 2025.12		39.4	27.9