Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MapFormer: Self-Supervised Learning of Cognitive Maps with Input-Dependent Positional Embeddings

About

A cognitive map is an internal model which encodes the abstract relationships among entities in the world, giving humans and animals the flexibility to adapt to new situations, with a strong out-of-distribution (OOD) generalization that current AI systems still do not possess. To bridge this gap, we introduce $\textit{MapFormers}$, new Transformer-based architectures, which can learn cognitive maps from observational data and perform path-integration without supervision. Cognitive maps are learned in the model by disentangling structural relationships in the inputs from their specific content, a property that can be achieved by updating position encodings with input-dependent matrices, built as exponentials of learned combinations of Lie-algebra generators. We developed two variants of $\textit{MapFormers}$ that unify absolute and relative positional encoding to model episodic (EM) and working memory (WM), respectively. We tested $\textit{MapFormers}$ on several formal tasks targeting distinct cognitive capacities, including gating, 2D navigation and nested hierarchies (Dyck Languages). Our results demonstrate that $\textit{MapFormers}$ significantly outperform current AI architectures, achieving near-perfect OOD generalization where standard models fail. Furthermore, we show that $\textit{MapFormers}$ are scalable; evaluations on naturalistic data yield perplexity improvements over baselines, suggesting that these principles extend to large-scale, real-world domains. These results are obtained through efficient parallel computation on commutative maps, though our models can also learn non-commutative cognitive maps via sequential path-integration. Overall, these results suggest that input-dependent matrices provide a critical structural bias, by disentangling abstract relations from content in order to drive robust OOD generalization.

Victor Rambaud, Salvador Mascarenhas, Yair Lakretz• 2025

Related benchmarks

TaskDatasetResultRank
Grid Navigation1D Grid Navigation Sequence length 128, grid width 64 (IID)
Accuracy100
12
Grid Navigation1D Grid Navigation 64/32/0.2 (OOD-dense D)
Accuracy100
12
Grid Navigation1D Grid Navigation OOD-sparse S 256/128/0.8
Accuracy100
12
Grid Navigation2D Grid Navigation Sequence length 128, grid width 64 (IID)
Accuracy100
12
Grid Navigation2D Grid Navigation D 64/32/0.2 (OOD-dense)
Accuracy100
12
Grid Navigation2D Grid Navigation S 256/128/0.8 (OOD-sparse)
Accuracy100
12
Selective-Copy TaskSelective-Copy OOD Dense split: 64/128 blank/non-blank
Accuracy100
8
Selective-Copy TaskSelective-Copy OOD Sparse 256 128 blank non-blank
Accuracy100
8
Selective-Copy TaskSelective-Copy IID 128/128 blank/non-blank
Accuracy100
8
Family tree navigationFamily tree navigation L=64, G=5 (IID)
Accuracy100
7
Showing 10 of 17 rows

Other info

Follow for update