Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Neuro-Inspired Inverse Learning for Planning and Control

About

We present a neuro-inspired framework for embodied planning and control. Building on three principles that enable fast and highly effective goal-directed behavior in the mammalian brain - paired forward/inverse internal models, open-loop multi-step motor commands, and sequential, hierarchical organization of action - our Inverter framework uses learned components, trained end-to-end through Inverse Learning (IL) and supplemented where natural by analytic or algorithmic modules; we formalize IL and delineate it from supervised, reinforcement, and imitation learning. IL bridges Reinforcement Learning (RL)-style amortization, which runs in a single forward pass but emits only one action at a time, and Optimal Control (OC)-style sequence planning over whole trajectories, but with iterative test-time computation. Single Inverters or hierarchical n=2 Inverter stacks match or improve on offline-RL and diffusion-planner baselines on all 3 maze2d and 6 antmaze D4RL variants by an average of +24.2% (range -1.9% to +78.2%), at one-to-two orders of magnitude less inference compute time. Distinctively, optimizing through the Forward Model (FoM) over the entire T-step action sequence - rather than per step - lets Inverters produce smooth, goal-coherent, trajectory-wide structure and reach control policies closer to the analytic optimum than the policy underlying the training data itself. We also identify a failure mode of IL: FoM hacking under narrow training-data coverage, which we mitigate by using random training data with broader coverage. As an application example, a Pulse Inverter synthesizes arbitrary single-qubit quantum gates with fidelity matching the standard iterative numerical baseline (GRAPE), at more than 1000x lower per-gate compute time. In summary, we conclude that IL enables a versatile class of world-interfaces, especially for latency- and resource-critical embodied AI.

Maryna Kapitonova, Tonio Ball• 2026

Related benchmarks

TaskDatasetResultRank
Offline Reinforcement LearningMaze2D medium v1
Normalized Return166.8
30
Offline Reinforcement LearningMaze2D large v1
Normalized Return220.7
30
Planning and Controlmaze2d-umaze v1 (100 episodes, 300 steps/ep)
Score165.2
16
Offline Reinforcement LearningAntMaze medium-play v2
Average Score87.8
14
Offline Reinforcement LearningAntMaze Medium-Diverse v2
Average Score0.965
14
Offline Reinforcement LearningAntMaze large-play v2
D4RL Score0.93
11
Offline Reinforcement LearningAntMaze large-diverse v2
D4RL Score94
11
Offline Reinforcement LearningAntMaze v2
umaze Success Rate99.5
7
Offline Reinforcement LearningAntmaze v0 (test)--
5
LocomotionAntmaze u-umaze v2
D4RL Score (%)99.5
2
Showing 10 of 27 rows

Other info

Follow for update