BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

About

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose Branch Value Estimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20\times$ in environments with over four million actions.

Matthew Landers, Taylor W. Killian, Hugo Barnes, Thomas Hartvigsen, Afsaneh Doryab• 2024

Related benchmarks

Task	Dataset	Result	Rank
Molecule Design	Molecule Design 1,500 samples (train)	Reward (R-10)7.271		13

Showing 1 of 1 rows

Other info

Follow for update

@wizwand_team Discord