Share your thoughts, 1 month free Claude Pro on usSee more

Multi-Objective Reinforcement Learning on Maze

223.55Mean Episode Reward (MER)

RANDOM

Updated 4mo ago

Evaluation Results

Method	Links
RANDOM 2026.03		223.55	0
MER-PPO 2026.03		85.55	0.05
Dense Oracle 2026.03		40.33	62.92
DPI 2026.03		30.16	59.04
DPI-PPO 2026.03		30.16	59.04
DPI 2026.03		27.35	42.94
RS 2026.03		23.66	0.01
FIXED 2026.03		16.15	1.12
SR-PPO 2026.03		15.28	61.13
ENVELOPE 2026.03		10.36	0.01
HEURISTIC 2026.03		3.65	0