Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Counterfactual Policy Evaluation on GridWorld p = 0.4

230Worst-Case Counterfactual V(s0)

Gumbel-max

75.456115.578155.7195.822Feb 19, 2025
Updated 1mo ago

Evaluation Results

MethodLinks
2025.02
230
2025.02
81.4