Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Offline-to-Online Reinforcement Learning on D4RL 6 environments min-max normalized (averaged)
Loading...
0.031
Normalized Regret
SMAC
-0.00624
0.24513
0.4965
0.74787
Feb 19, 2026
Normalized Regret
Updated 4d ago
Evaluation Results
Method
Method
Links
Normalized Regret
SMAC
Online Algorithm=SAC
2026.02
0.031
SMAC
Online Algorithm=TD3
2026.02
0.09
SMAC
Online Algorithm=TD3+BC
2026.02
0.226
SMAC
Online Algorithm=AWR
2026.02
0.38
CalQL/CQL
Online Algorithm=TD3
2026.02
0.442
CalQL/CQL
Online Algorithm=SAC
2026.02
0.448
IQL
Online Algorithm=SAC
2026.02
0.471
CalQL/CQL
Online Algorithm=AWR
2026.02
0.482
IQL
Online Algorithm=TD3+BC
2026.02
0.494
IQL
Online Algorithm=AWR
2026.02
0.508
TD3+BC
Online Algorithm=TD3
2026.02
0.545
TD3+BC
Online Algorithm=TD3+BC
2026.02
0.562
CalQL/CQL
Online Algorithm=TD3+BC
2026.02
0.614
IQL
Online Algorithm=TD3
2026.02
0.653
TD3+BC
Online Algorithm=AWR
2026.02
0.654
TD3+BC
Online Algorithm=SAC
2026.02
0.962
Feedback
Search any
task
Search any
task