Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation
About
Query-based image retrieval (QBIR) requires retrieving relevant images given diverse and often stylistically heterogeneous queries, such as sketches, artworks, or low-resolution previews. While large-scale vision--language representation models (VLRMs) like CLIP offer strong zero-shot retrieval performance, they struggle with distribution shifts caused by unseen query styles. In this paper, we propose the Hypernetwork-driven Style-adaptive Retrieval (Hystar), a lightweight framework that dynamically adapts model weights to each query's style. Hystar employs a hypernetwork to generate singular-value perturbations ($\Delta S$) for attention layers, enabling flexible per-input adaptation, while static singular-value offsets on MLP layers ensure cross-style stability. To better handle semantic confusions across styles, we design StyleNCE as part of Hystar, an optimal-transport-weighted contrastive loss that emphasizes hard cross-style negatives. Extensive experiments on multi-style retrieval and cross-style classification benchmarks demonstrate that Hystar consistently outperforms strong baselines, achieving state-of-the-art performance while being parameter-efficient and stable across styles.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Classification | DomainNet | Accuracy (clp)79.5 | 23 | |
| Query-Based Image Retrieval | DSR | Art Top-1 Acc75.6 | 14 | |
| Image Classification | ImageNet H | Top-1 Accuracy73.93 | 13 | |
| Image Classification | ImageNet New classes 2009 | Top-1 Accuracy70.98 | 6 | |
| Image Classification | ImageNet Base classes 2009 | Top-1 Accuracy77.13 | 6 | |
| Image Classification | SUN397 Base classes 2010 | Top-1 Accuracy81.89 | 6 | |
| Image Classification | SUN397 2010 (New classes) | Top-1 Accuracy78.41 | 6 | |
| Image Classification | SUN397 2010 (H) | Top-1 Accuracy80.16 | 6 | |
| Category-level retrieval | DomainNet coarse-grained | Clipart Top-1 Accuracy75.7 | 5 | |
| Joint Style-Text Retrieval | DSR (test) | Art+Text Accuracy79.9 | 5 |