Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

$\text{H}^2$em: Learning Hierarchical Hyperbolic Embeddings for Compositional Zero-Shot Learning

About

Compositional zero-shot learning (CZSL) aims to recognize unseen state-object compositions by generalizing from a training set of their primitives (state and object). Current methods often overlook the rich hierarchical structures, such as the semantic hierarchy of primitives (e.g., apple fruit) and the conceptual hierarchy between primitives and compositions (e.g, sliced apple apple). A few recent efforts have shown effectiveness in modeling these hierarchies through loss regularization within Euclidean space. In this paper, we argue that they fail to scale to the large-scale taxonomies required for real-world CZSL: the space's polynomial volume growth in flat geometry cannot match the exponential structure, impairing generalization capacity. To this end, we propose H2em, a new framework that learns Hierarchical Hyperbolic EMbeddings for CZSL. H2em leverages the unique properties of hyperbolic geometry, a space naturally suited for embedding tree-like structures with low distortion. However, a naive hyperbolic mapping may suffer from hierarchical collapse and poor fine-grained discrimination. We further design two learning objectives to structure this space: a Dual-Hierarchical Entailment Loss that uses hyperbolic entailment cones to enforce the predefined hierarchies, and a Discriminative Alignment Loss with hard negative mining to establish a large geodesic distance between semantically similar compositions. Furthermore, we devise Hyperbolic Cross-Modal Attention to realize instance-aware cross-modal infusion within hyperbolic geometry. Extensive ablations on three benchmarks demonstrate that H2em establishes a new state-of-the-art in both closed-world and open-world scenarios. Our codes will be released.

Lin Li, Jiahui Li, Jiaming Lei, Jun Xiao, Feifei Shao, Long Chen• 2025

Related benchmarks

TaskDatasetResultRank
Compositional Zero-Shot LearningUT-Zappos Closed World
HM59.8
42
Compositional Zero-Shot LearningC-GQA Closed World
HM33.9
41
Compositional Zero-Shot LearningUT-Zappos open world
HM54.5
38
Compositional Zero-Shot LearningMIT-States open world
HM23.3
38
Compositional Zero-Shot LearningC-GQA open world
HM Score15.2
35
Compositional Zero-Shot LearningMIT-States Closed World
Harmonic Mean (HM)0.413
32
Showing 6 of 6 rows

Other info

Follow for update