Share your thoughts, 1 month free Claude Pro on usSee more

Safe Reinforcement Learning on 15-concept neural simulation environment (test)

34.64Return

Unconstrained

Updated 3mo ago

Evaluation Results

Method	Links
Unconstrained 2026.04		34.64	69.92	91.7	19.02	27.14
Reward-Shaped 2026.04		34.64	69.92	91.7	19.02	27.14
Post-hoc 2026.04		34.64	69.92	91.7	19.02	27.14
MC-CPO 2026.04		32.73	44.47	87.1	9.69	23.14
MC-CPO (no frontier) 2026.04		32.68	44.51	87.2	9.69	23.12