Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Policy Optimization on Office World MAP1
Loading...
3,125
Avg Training Steps
QR-MAXRM
-49,552.8
306,022.35
661,597.5
1,017,172.65
Dec 16, 2025
Avg Training Steps
Updated 4d ago
Evaluation Results
Method
Method
Links
Avg Training Steps
QR-MAXRM
Exp=EXP5
2025.12
3,125
QR-MAX
Exp=EXP5
2025.12
24,222
QRM
Exp=EXP5
2025.12
225,140
UCBVI-sB
Exp=EXP5
2025.12
250,800
R-MAX
Exp=EXP5
2025.12
272,080
UCBVI-B
Exp=EXP5
2025.12
555,977
UCBVI-H
Exp=EXP5
2025.12
1,320,070
Feedback
Search any
task
Search any
task