Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multi-Objective Reinforcement Learning on Maze
Loading...
223.55
Mean Episode Reward (MER)
RANDOM
-5.146
54.227
113.6
172.973
Mar 24, 2026
Mean Episode Reward (MER)
Success Rate (SR)
Updated 25d ago
Evaluation Results
Method
Method
Links
Mean Episode Reward (MER)
Success Rate (SR)
RANDOM
2026.03
223.55
0
MER-PPO
2026.03
85.55
0.05
Dense Oracle
2026.03
40.33
62.92
DPI
Algorithm=PPO
2026.03
30.16
59.04
DPI-PPO
2026.03
30.16
59.04
DPI
Algorithm=Q-learning
2026.03
27.35
42.94
RS
2026.03
23.66
0.01
FIXED
2026.03
16.15
1.12
SR-PPO
2026.03
15.28
61.13
ENVELOPE
2026.03
10.36
0.01
HEURISTIC
2026.03
3.65
0
Feedback
Search any
task
Search any
task