Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Policy optimization on Office World Map 2 Exp 5
Loading...
3,767
Average Training Steps
QR-MAXRM
-71,682.32
437,600.59
946,883.5
1,456,166.41
Dec 16, 2025
Average Training Steps
Updated 4d ago
Evaluation Results
Method
Method
Links
Average Training Steps
QR-MAXRM
2025.12
3,767
QR-MAX
2025.12
83,076
QRM
2025.12
438,213
R-MAX
2025.12
723,441
UCBVI-sB
2025.12
1,150,000
UCBVI-B
2025.12
1,400,000
UCBVI-H
2025.12
1,890,000
Feedback
Search any
task
Search any
task