Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

PRISM: Parallel Reward Integration with Symmetry for MORL

About

This work studies heterogeneous Multi-Objective Reinforcement Learning (MORL), where objectives can differ sharply in temporal frequency. Such heterogeneity allows dense objectives to dominate learning, while sparse long-horizon rewards receive weak credit assignment, leading to poor sample efficiency. We propose a Parallel Reward Integration with Symmetry (PRISM) algorithm that enforces reflectional symmetry as an inductive bias in aligning reward channels. PRISM introduces ReSymNet, a theory-motivated model that reconciles temporal-frequency mismatches across objectives, using residual blocks to learn a scaled opportunity value that accelerates exploration while preserving the optimal policy. We also propose SymReg, a reflectional equivariance regulariser that enforces agent mirroring and constrains policy search to a reflection-equivariant subspace. This restriction provably reduces hypothesis complexity and improves generalisation. Across MuJoCo benchmarks, PRISM consistently outperforms both a sparse-reward baseline and an oracle trained with full dense rewards, improving Pareto coverage and distributional balance: it achieves hypervolume gains exceeding 100\% over the baseline and up to 32\% over the oracle. The code is at \href{https://github.com/EVIEHub/PRISM}{https://github.com/EVIEHub/PRISM}.

Finn van der Knaap, Kejiang Qian, Zheng Xu, Fengxiang He• 2026

Related benchmarks

TaskDatasetResultRank
Multi-objective Reinforcement Learningmo-hopper v5
Hypervolume (x10^7)1.58e+7
3
Multi-objective Reinforcement Learningmo-walker2d v5
Hypervolume (HV)4.77e+4
3
Multi-objective Reinforcement Learningmo-halfcheetah v5
HV (x10^4)2.25e+4
3
Multi-objective Reinforcement Learningmo-swimmer v5
Hypervolume (HV)1.21e+4
3
Showing 4 of 4 rows

Other info

Follow for update