Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Policy Optimization on Office World Map 4 Exp 6
Loading...
5,630
Average Training Steps
QR-MAXRM
-390,558.4
2,283,713.3
4,957,985
7,632,256.7
Dec 16, 2025
Average Training Steps
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Training Steps
QR-MAXRM
2025.12
5,630
QR-MAX
2025.12
20,150
UCBVI-sB
2025.12
80,160
R-MAX
2025.12
82,000
UCBVI-B
2025.12
90,030
UCBVI-H
2025.12
101,120
QRM
2025.12
9,910,340
Feedback
Search any
task
Search any
task