Neurosymbolic Object-Centric Learning with Distant Supervision

About

Neurosymbolic learning can use symbolic rules to provide supervision for latent concepts from weak labels, but it commonly assumes that the entities referenced by these rules are already specified. Object-centric models decompose images into slot-like representations; however, such slots are not necessarily aligned with the predicates required for symbolic reasoning. We investigate object-centric neurosymbolic learning under distant supervision, where the object-level arguments of a logic program are learned directly from images using only global task labels. We introduce DeepObjectLog, a probabilistic neurosymbolic model that integrates a slot-based perceptual encoder with a probabilistic logic layer. The encoder predicts objectness and class probabilities for candidate object representations, while the logic layer marginalizes over latent objectness and class assignments to compute the likelihood of the observed label. This formulation provides a differentiable task-level learning signal for object-centric perception without requiring per-object labels, masks, bounding boxes, or heuristic set matching. Evaluations across diverse visual reasoning tasks demonstrate that DeepObjectLog achieves superior out-of-distribution generalization to compositional, object-count, and rule shifts compared to neural object-centric and standard neurosymbolic baselines.

Stefano Colamonaco, David Debot, Giuseppe Marra• 2025

Related benchmarks

Task	Dataset	Result
Image Classification	MM-A out-of-distribution (OOD)	Task Accuracy90	6
Classification	PokerRules standard (test)	Task Accuracy97.9	6
Image Classification	MM-A in-distribution (test)	Accuracy94.26	6
Classification	PokerRules Extrapolation: 5 cards (In-distribution class)	Task Accuracy78.53	5
Image Classification	MM-A Extrapolation 4 digits	Task Accuracy69.73	5
Image Classification	MM-A Extrapolation 5 digits	Task Accuracy44.06	5
Addition	CLEVR-Addition 7 objects (extrapolation)	Task Accuracy59.81	3
Visual Digit Addition	MultiMNIST Addition (OOD Compositions)	Accuracy90	3
Visual Digit Addition	MultiMNIST-Addition (Extrapolation (4 digits))	Accuracy69.73	3
Visual Digit Addition	MultiMNIST-Addition (Extrapolation (5 digits))	Accuracy44.06	3

Showing 10 of 18 rows

Other info

Follow for update

@wizwand_team Discord