Enhancing Human-Likeness in Reinforcement Learning Agents via Hierarchical Macro Action Quantization
About
Human-like agents are a long-standing goal of artificial intelligence. Despite strong performance, most reinforcement learning (RL) agents remain reward-driven and often exhibit behaviors that differ from humans, limiting interpretability and reliability. In this work, we introduce a novel human-like RL framework that predicts action sequences closely aligned with human behaviors while maximizing rewards. Specifically, we encode human demonstrations into macro actions using a hierarchical macro action quantization approach (termed HiMAQ) consisting of two successive levels of vector quantization. The lower quantization level maps input actions to fine-grained subaction clusters, while the higher quantization level aggregates these subaction clusters into action clusters. Extensive evaluations on the D4RL benchmarks show that our hierarchical approach outperforms the non-hierarchical baseline (MAQ), achieving better human-likeness scores while maintaining comparable or better success rates than previous RL agents. The improvements generalize across integrations with various RL algorithms, namely IQL, SAC, and RLPD.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Robotic Manipulation | Adroit Pen | Success Rate (SR)0.51 | 13 | |
| Robotic Manipulation | Adroit Door | DTW Score (DTWs)0.87 | 8 | |
| Robotic Manipulation | Adroit Hammer | DTWs0.64 | 8 | |
| Robotic Manipulation | adroit-relocate | DTW (s)0.46 | 8 | |
| Door | Adroit | DTWs0.83 | 4 | |
| Hammer | Adroit | DTW Distance (Hammer)0.67 | 4 | |
| Pen | Adroit | DTW Score (Success)0.61 | 4 | |
| Relocate | Adroit | DTW37 | 4 | |
| Robotic Manipulation | Adroit Average | DTW Score0.63 | 4 | |
| Robotic Manipulation | Adroit Pen | DTW Distance0.57 | 4 |