Our new X account is live! Follow @wizwand_team for updates
Search any
task
Feedback
Search any
task
SOTA Policy Optimization benchmarks and papers with code | Wizwand
Our new X account is live! Follow @wizwand_team for updates
Home
/
Tasks
Policy Optimization
Benchmarks
Dataset Name
SOTA Method
Dataset Name
SOTA Method
Metric
Trend
Results
Last Updated
Office World MAP0
QR-MAXRM
Avg Training Steps
4,150
18
4d ago
Office World Map 3, Exp 5
QR-MAXRM
Average Training Steps
5,806
7
4d ago
Office World Map 2 Exp 5
QR-MAXRM
Average Training Steps
3,767
7
4d ago
Office World Map 4 Exp 6
QR-MAXRM
Average Training Steps
5,630
7
4d ago
Office World Map 1, Exp 5
QR-MAXRM
Average Training Steps
3,125
7
4d ago
Office World MAP4
QR-MAXRM
Average Training Steps
5,630
7
4d ago
Office World MAP1
QR-MAXRM
Avg Training Steps
3,125
7
4d ago
10 agents, random subsets of warehouses (test)
max-quantile
Gini Index
0.0625
6
4d ago
5 symmetric agents, one per warehouse (test)
max-quantile
Gini Index
0.0188
6
4d ago
MuJoCo Suite Summary
MAX-RETURN
Average Normalized Performance
100
5
4d ago
MuJoCo HalfCheetah H=40
MAX-RETURN
Return
49.1
5
4d ago
MuJoCo HalfCheetah H=20
MAX-RETURN
Return
13.3
5
4d ago
MuJoCo HalfCheetah H=10
OFF-SL
Return
2.8
5
4d ago
MuJoCo Walker2d H=40
MAX-RETURN
Return
221.1
5
4d ago
MuJoCo Walker2d H=20
MAX-RETURN
Return
60.7
5
4d ago
MuJoCo Hopper H=40
MAX-RETURN
Return
71
5
4d ago
Policy Action Space
Policy gradient
Preprocessing Time
0
1
4d ago
s-rectangular Robust MDP Discounted Reward
-
-
0
4d ago
(s, a)-rectangular Robust MDP Discounted Reward
-
-
0
4d ago
Non-rectangular Robust MDP Average Reward
-
-
0
4d ago
Non-rectangular Robust MDP Discounted Reward
-
-
0
4d ago
Showing 21 of 21 rows
25 / page
50 / page
100 / page
1
Search any
task
Search any
task
Terms of Service
FAQs