Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Off-policy learning on Simulation Blocks 2-5 Cross-block averages
Loading...
0.7216
Average Value
Ma-style OPL
0.67532
0.687335
0.69935
0.711365
Apr 24, 2026
Average Value
Average Regret
Average Burden
Updated 1mo ago
Evaluation Results
Method
Method
Links
Average Value
Average Regret
Average Burden
Ma-style OPL
2026.04
0.7216
0.1079
38.73
DR value only
2026.04
0.7176
0.1119
31.27
DR-LCB
parameter=0.50
2026.04
0.7175
0.112
31.01
CASP
parameter=0.05
2026.04
0.7147
0.1148
26.77
Plug-in
2026.04
0.7122
0.1173
44.98
Wang-style generator
2026.04
0.696
0.1335
43.5
Stagewise
2026.04
0.6771
0.1524
38.9
Feedback
Search any
task
Search any
task