Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

About

Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces multi-objective reinforcement learning based on decomposition (MORL/D), a novel methodology bridging the literature of RL and MOO. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Its versatility is demonstrated by implementing it in different configurations and assessing it on contrasting benchmark problems. Results indicate MORL/D instantiations achieve comparable performance to current state-of-the-art approaches on the studied problems. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL.

Florian Felten, El-Ghazali Talbi, Gr\'egoire Danoy• 2023

Related benchmarks

Task	Dataset	Result
Continuous Control	MuJoCo Walker2d	Uncertainty Time (UT)0.05	11
Continuous Control	MuJoCo Humanoid2d	UT Score2.71	11
Continuous Control	MuJoCo Humanoid5d	Undiscounted Return (UT)0.58	11
Continuous Control	MuJoCo Hopper3d	UT Score0.11	11
Continuous Control	MuJoCo Halfcheetah2d	UT Score0.41	11
Continuous Control	MuJoCo Ant3d	UT0.04	11
Multi-objective Reinforcement Learning	Deep Sea Treasure	Hypervolume (HV)5.63	10
Multi-objective Reinforcement Learning	Fruit Tree Navigation	UT4.19	7
Multi-Objective Optimization	Complex SC (test)	Hv Mean0.3619	6
Multi-objective Reinforcement Learning	mo-walker2d v5	Hypervolume (HV)6.52e+6	6

Showing 10 of 20 rows

Other info

Follow for update

@wizwand_team Discord