Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CAGE-SGG: Counterfactual Active Graph Evidence for Open-Vocabulary Scene Graph Generation

About

Open-vocabulary scene graph generation (SGG) aims to describe visual scenes with flexible and fine-grained relation phrases beyond a fixed predicate vocabulary. While recent vision-language models greatly expand the semantic coverage of SGG, they also introduce a critical reliability issue: predicted relations may be driven by language priors or object co-occurrence rather than grounded visual evidence. In this paper, we propose an evidence-rounded open-vocabulary SGG framework based on counterfactual relation verification. Instead of directly accepting plausible relation proposals, our method verifies whether each candidate relation is supported by relation-pecific visual, geometric, and contextual evidence. Specifically, we first generate open-vocabulary relation candidates with a vision-language proposer, then decompose predicate phrases into soft evidence bases such as support, contact, containment, depth and state. A relation-conditioned evidence encoder extracts predicate-relevant cues, while a counterfactual verifier tests whether the relation score decreases when necessary vidence is removed and remains stable under irrelevant perturbations. We further introduce contradiction-aware predicate learning and graph-level preference optimization to improve fine-grained discrimination and global graph consistency. Experiments on conventional, open-vocabulary, and panoptic SGG benchmarks show that our method consistently improves standard recall-based metrics, unseen predicate generalization, and counterfactual grounding quality. These results demonstrate that moving from relation generation to relation verification leads to more reliable, interpretable, and evidence-grounded scene graphs.

Suiyang Guang, Chenyu Liu, Ruohan Zhang, Siyuan Chen• 2026

Related benchmarks

TaskDatasetResultRank
Scene Graph GenerationVG-150 (test)
R@5073.4
24
Scene Graph GenerationVG150
mR@5015.9
22
Panoptic Scene Graph GenerationPSG
PR@5040.6
10
Scene Graph GenerationOV-VG (test)
S-mR@5028.9
6
Counterfactual GroundingOV-VG (test)
CF-Acc74.9
5
Scene Graph GenerationOV-VG biased (test)
Bias Accuracy63.8
5
Showing 6 of 6 rows

Other info

Follow for update