Neuro-Inspired Inverse Learning for Planning and Control

About

We present a neuro-inspired framework for embodied planning and control. Building on three principles that enable fast and highly effective goal-directed behavior in the mammalian brain - paired forward/inverse internal models, open-loop multi-step motor commands, and sequential, hierarchical organization of action - our Inverter framework uses learned components, trained end-to-end through Inverse Learning (IL) and supplemented where natural by analytic or algorithmic modules; we formalize IL and delineate it from supervised, reinforcement, and imitation learning. IL bridges Reinforcement Learning (RL)-style amortization, which runs in a single forward pass but emits only one action at a time, and Optimal Control (OC)-style sequence planning over whole trajectories, but with iterative test-time computation. Single Inverters or hierarchical n=2 Inverter stacks match or improve on offline-RL and diffusion-planner baselines on all 3 maze2d and 6 antmaze D4RL variants by an average of +24.2% (range -1.9% to +78.2%), at one-to-two orders of magnitude less inference compute time. Distinctively, optimizing through the Forward Model (FoM) over the entire T-step action sequence - rather than per step - lets Inverters produce smooth, goal-coherent, trajectory-wide structure and reach control policies closer to the analytic optimum than the policy underlying the training data itself. We also identify a failure mode of IL: FoM hacking under narrow training-data coverage, which we mitigate by using random training data with broader coverage. As an application example, a Pulse Inverter synthesizes arbitrary single-qubit quantum gates with fidelity matching the standard iterative numerical baseline (GRAPE), at more than 1000x lower per-gate compute time. In summary, we conclude that IL enables a versatile class of world-interfaces, especially for latency- and resource-critical embodied AI.

Maryna Kapitonova, Tonio Ball• 2026

Related benchmarks

Task	Dataset	Result
Offline Reinforcement Learning	Maze2D medium v1	Normalized Return166.8	30
Offline Reinforcement Learning	Maze2D large v1	Normalized Return220.7	30
Planning and Control	maze2d-umaze v1 (100 episodes, 300 steps/ep)	Score165.2	16
Offline Reinforcement Learning	AntMaze medium-play v2	Average Score87.8	14
Offline Reinforcement Learning	AntMaze Medium-Diverse v2	Average Score0.965	14
Offline Reinforcement Learning	AntMaze large-play v2	D4RL Score0.93	11
Offline Reinforcement Learning	AntMaze large-diverse v2	D4RL Score94	11
Offline Reinforcement Learning	AntMaze v2	umaze Success Rate99.5	7
Offline Reinforcement Learning	Antmaze v0 (test)	--	5
Locomotion	Antmaze u-umaze v2	D4RL Score (%)99.5	2

Showing 10 of 27 rows

Other info

Follow for update

@wizwand_team Discord