Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework

About

Multi-objective reinforcement learning (MORL) extends traditional RL by seeking policies making different compromises among conflicting objectives. The recent surge of interest in MORL has led to diverse studies and solving methods, often drawing from existing knowledge in multi-objective optimization based on decomposition (MOO/D). Yet, a clear categorization based on both RL and MOO/D is lacking in the existing literature. Consequently, MORL researchers face difficulties when trying to classify contributions within a broader context due to the absence of a standardized taxonomy. To tackle such an issue, this paper introduces multi-objective reinforcement learning based on decomposition (MORL/D), a novel methodology bridging the literature of RL and MOO. A comprehensive taxonomy for MORL/D is presented, providing a structured foundation for categorizing existing and potential MORL works. The introduced taxonomy is then used to scrutinize MORL research, enhancing clarity and conciseness through well-defined categorization. Moreover, a flexible framework derived from the taxonomy is introduced. This framework accommodates diverse instantiations using tools from both RL and MOO/D. Its versatility is demonstrated by implementing it in different configurations and assessing it on contrasting benchmark problems. Results indicate MORL/D instantiations achieve comparable performance to current state-of-the-art approaches on the studied problems. By presenting the taxonomy and framework, this paper offers a comprehensive perspective and a unified vocabulary for MORL. This not only facilitates the identification of algorithmic contributions but also lays the groundwork for novel research avenues in MORL.

Florian Felten, El-Ghazali Talbi, Gr\'egoire Danoy• 2023

Related benchmarks

TaskDatasetResultRank
Continuous ControlMuJoCo Walker2d
Uncertainty Time (UT)0.05
11
Continuous ControlMuJoCo Humanoid2d
UT Score2.71
11
Continuous ControlMuJoCo Humanoid5d
Undiscounted Return (UT)0.58
11
Continuous ControlMuJoCo Hopper3d
UT Score0.11
11
Continuous ControlMuJoCo Halfcheetah2d
UT Score0.41
11
Continuous ControlMuJoCo Ant3d
UT0.04
11
Multi-objective Reinforcement LearningDeep Sea Treasure
Hypervolume (HV)5.63
10
Multi-objective Reinforcement LearningFruit Tree Navigation
UT4.19
7
Multi-objective Reinforcement Learningmo-walker2d v5
Hypervolume (HV)6.52e+6
6
Multi-objective Reinforcement Learningmo-halfcheetah v5
HV (x10^4)1.88e+3
6
Showing 10 of 12 rows

Other info

Follow for update