Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Predictive Reasoning with Augmented Anomaly Contrastive Learning for Compositional Visual Relations

About

While visual reasoning for simple analogies has received significant attention, compositional visual relations (CVR) remain relatively unexplored due to their greater complexity. To solve CVR tasks, we propose Predictive Reasoning with Augmented Anomaly Contrastive Learning (PR-A$^2$CL), \ie, to identify an outlier image given three other images that follow the same compositional rules. To address the challenge of modelling abundant compositional rules, an Augmented Anomaly Contrastive Learning is designed to distil discriminative and generalizable features by maximizing similarity among normal instances while minimizing similarity between normal and anomalous outliers. More importantly, a predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one. Throughout the subsequent verification stage, the PARBs progressively pinpoint the specific discrepancies attributable to the underlying rules. Experimental results on SVRT, CVR and MC$^2$R datasets show that PR-A$^2$CL significantly outperforms state-of-the-art reasoning models.

Chengtai Li, Yuting He, Jianfeng Ren, Ruibin Bai, Yitian Zhao, Heng Yu, Xudong Jiang• 2026

Related benchmarks

TaskDatasetResultRank
Abstract Visual ReasoningSVRT reformulated four-choice (test)
Accuracy99.4
28
Compositional Visual ReasoningCVR
Accuracy (Joint)97.1
16
Abstract Visual ReasoningMC2R 20 samples (train)
Accuracy22.7
12
Abstract Visual ReasoningMC2R 50 samples (train)
Accuracy32.1
12
Abstract Visual ReasoningMC2R (100 train samples)
Accuracy39.8
12
Abstract Visual ReasoningMC2R 200 (train)
Accuracy50.3
12
Abstract Visual ReasoningMC2R 500 samples (train)
Accuracy0.631
12
Abstract Visual ReasoningMC2R (train)
Accuracy77.4
12
Abstract Visual ReasoningMC2R 10,000 samples (train)
Accuracy90.4
12
Compositional Visual ReasoningCVR 34 (test)
Elementary Accuracy99.3
3
Showing 10 of 10 rows

Other info

Follow for update