Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

About

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20\times$ in environments with over four million actions.

Matthew Landers, Taylor W. Killian, Hugo Barnes, Thomas Hartvigsen, Afsaneh Doryab• 2024

Related benchmarks

TaskDatasetResultRank
Molecule DesignMolecule Design 1,500 samples (train)
Reward (R-10)7.271
13
Showing 1 of 1 rows

Other info

Follow for update