Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment

About

With the rapid development of text-to-image generation technology, accurately assessing the alignment between generated images and text prompts has become a critical challenge. Existing methods rely on Euclidean space metrics, neglecting the structured nature of semantic alignment, while lacking adaptive capabilities for different samples. To address these limitations, we propose HyperAlign, an adaptive text-to-image alignment assessment framework based on hyperbolic entailment geometry. First, we extract Euclidean features using CLIP and map them to hyperbolic space. Second, we design a dynamic-supervision entailment modeling mechanism that transforms discrete entailment logic into continuous geometric structure supervision. Finally, we propose an adaptive modulation regressor that utilizes hyperbolic geometric features to generate sample-level modulation parameters, adaptively calibrating Euclidean cosine similarity to predict the final score. HyperAlign achieves highly competitive performance on both single database evaluation and cross-database generalization tasks, fully validating the effectiveness of hyperbolic geometric modeling for image-text alignment assessment.

Wenzhi Chen, Bo Hu, Leida Li, Lihuo He, Wen Lu, Xinbo Gao• 2026

Related benchmarks

TaskDatasetResultRank
Image Quality AssessmentAGIQA-3K
SRCC0.7927
112
Visual Quality AssessmentAIGCIQA 2023
SRCC0.8078
34
Alignment Quality AssessmentAIGCIQA2023 (test)
SRCC0.6309
24
Image Quality AssessmentPKU-I2IQA
SRCC0.7977
11
Image-Text Alignment AssessmentAGIQA3K (test)
SRCC0.7013
10
Showing 5 of 5 rows

Other info

Follow for update