Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Frictional Q-Learning

About

Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From this perspective, the replay buffer is represented as a smooth, low-dimensional action manifold, where the support directions correspond to the tangential component, while the normal component captures the dominant first-order extrapolation error. This decomposition reveals an intrinsic anisotropy in value sensitivity that naturally induces a stability condition analogous to a friction threshold. To mitigate deviations toward unsupported actions, we propose Frictional Q-Learning, an off-policy algorithm that encodes supported actions as tangent directions using a contrastive variational autoencoder. We further show that an orthonormal basis of the orthogonal complement corresponds to normal components under mild local isometry assumptions. Extensive empirical results on standard continuous-control benchmarks consistently demonstrate robust and stable performance compared with competitive baselines.

Hyunwoo Kim, Hyo Kyung Lee• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningD4RL halfcheetah-medium-expert
Normalized Score85.25
169
Offline Reinforcement LearningD4RL Hopper-medium-expert v2
Normalized Return90.29
61
Offline Reinforcement LearningD4RL Medium-Replay Walker2d
Normalized Score54.87
52
Continuous ControlMuJoCo Ant v4
Average Return6.18e+3
46
Continuous ControlMuJoCo Walker2d v4
Normalized Performance56.5986
39
Continuous ControlMuJoCo HalfCheetah v4
Average Return1.60e+4
36
Offline Reinforcement LearningD4RL hopper medium-replay
Reward73.33
32
Offline Reinforcement LearningD4RL Halfcheetah medium
Reward50.66
30
Offline Reinforcement LearningD4RL HalfCheetah Med-Replay
Normalized Avg Return46.23
22
Offline Reinforcement LearningD4RL Walker2d medium
Normalized Avg Return44.6
20
Showing 10 of 17 rows

Other info

Follow for update