Transitive RL: Value Learning via Divide and Conquer

About

In this work, we present Transitive Reinforcement Learning (TRL), a new value learning algorithm based on a divide-and-conquer paradigm. TRL is designed for offline goal-conditioned reinforcement learning (GCRL) problems, where the aim is to find a policy that can reach any state from any other state in the smallest number of steps. TRL converts a triangle inequality structure present in GCRL into a practical divide-and-conquer value update rule. This has several advantages compared to alternative value learning paradigms. Compared to temporal difference (TD) methods, TRL suffers less from bias accumulation, as in principle it only requires $O(\log T)$ recursions (as opposed to $O(T)$ in TD learning) to handle a length-$T$ trajectory. Unlike Monte Carlo methods, TRL suffers less from high variance as it performs dynamic programming. Experimentally, we show that TRL achieves the best performance in highly challenging, long-horizon benchmark tasks compared to previous offline GCRL algorithms.

Seohong Park, Aditya Oberai, Pranav Atreya, Sergey Levine• 2025

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	puzzle-4x4-play OGBench 5 tasks v0	Average Success Rate34	28
Overall	puzzle 4x5	Success Rate9.70e+3	10
Overall	puzzle 4x6	Success Rate5.10e+3	10
task1	humanoidmaze giant	Success Rate71	10
task1	puzzle 4x6	Success Rate1.00e+4	10
task2	puzzle 4x5	Success Rate9.90e+3	10
task2	puzzle 4x6	Success Rate66	10
task3	puzzle 4x5	Success Rate100	10
task3	puzzle 4x6	Success Rate6.70e+3	10
task4	humanoidmaze giant	Success Rate9.40e+3	10

Showing 10 of 36 rows

Other info

Follow for update

@wizwand_team Discord