Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Transitive RL: Value Learning via Divide and Conquer

About

In this work, we present Transitive Reinforcement Learning (TRL), a new value learning algorithm based on a divide-and-conquer paradigm. TRL is designed for offline goal-conditioned reinforcement learning (GCRL) problems, where the aim is to find a policy that can reach any state from any other state in the smallest number of steps. TRL converts a triangle inequality structure present in GCRL into a practical divide-and-conquer value update rule. This has several advantages compared to alternative value learning paradigms. Compared to temporal difference (TD) methods, TRL suffers less from bias accumulation, as in principle it only requires $O(\log T)$ recursions (as opposed to $O(T)$ in TD learning) to handle a length-$T$ trajectory. Unlike Monte Carlo methods, TRL suffers less from high variance as it performs dynamic programming. Experimentally, we show that TRL achieves the best performance in highly challenging, long-horizon benchmark tasks compared to previous offline GCRL algorithms.

Seohong Park, Aditya Oberai, Pranav Atreya, Sergey Levine• 2025

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement Learningpuzzle-4x4-play OGBench 5 tasks v0
Average Success Rate34
18
Overallpuzzle 4x5
Success Rate9.70e+3
10
Overallpuzzle 4x6
Success Rate5.10e+3
10
task1humanoidmaze giant
Success Rate71
10
task1puzzle 4x6
Success Rate1.00e+4
10
task2puzzle 4x5
Success Rate9.90e+3
10
task2puzzle 4x6
Success Rate66
10
task3puzzle 4x5
Success Rate100
10
task3puzzle 4x6
Success Rate6.70e+3
10
task4humanoidmaze giant
Success Rate9.40e+3
10
Showing 10 of 28 rows

Other info

Follow for update