Neurosymbolic Object-Centric Learning with Distant Supervision
About
Neurosymbolic learning can use symbolic rules to provide supervision for latent concepts from weak labels, but it commonly assumes that the entities referenced by these rules are already specified. Object-centric models decompose images into slot-like representations; however, such slots are not necessarily aligned with the predicates required for symbolic reasoning. We investigate object-centric neurosymbolic learning under distant supervision, where the object-level arguments of a logic program are learned directly from images using only global task labels. We introduce DeepObjectLog, a probabilistic neurosymbolic model that integrates a slot-based perceptual encoder with a probabilistic logic layer. The encoder predicts objectness and class probabilities for candidate object representations, while the logic layer marginalizes over latent objectness and class assignments to compute the likelihood of the observed label. This formulation provides a differentiable task-level learning signal for object-centric perception without requiring per-object labels, masks, bounding boxes, or heuristic set matching. Evaluations across diverse visual reasoning tasks demonstrate that DeepObjectLog achieves superior out-of-distribution generalization to compositional, object-count, and rule shifts compared to neural object-centric and standard neurosymbolic baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | MM-A out-of-distribution (OOD) | Task Accuracy90 | 6 | |
| Classification | PokerRules standard (test) | Task Accuracy97.9 | 6 | |
| Image Classification | MM-A in-distribution (test) | Accuracy94.26 | 6 | |
| Classification | PokerRules Extrapolation: 5 cards (In-distribution class) | Task Accuracy78.53 | 5 | |
| Image Classification | MM-A Extrapolation 4 digits | Task Accuracy69.73 | 5 | |
| Image Classification | MM-A Extrapolation 5 digits | Task Accuracy44.06 | 5 | |
| Addition | CLEVR-Addition 7 objects (extrapolation) | Task Accuracy59.81 | 3 | |
| Visual Digit Addition | MultiMNIST Addition (OOD Compositions) | Accuracy90 | 3 | |
| Visual Digit Addition | MultiMNIST-Addition (Extrapolation (4 digits)) | Accuracy69.73 | 3 | |
| Visual Digit Addition | MultiMNIST-Addition (Extrapolation (5 digits)) | Accuracy44.06 | 3 |