CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data
About
We present an approach named CurlingNet that can measure the semantic distance of composition of image-text embedding. In order to learn an effective image-text composition for the data in the fashion domain, our model proposes two key components as follows. First, the Delivery makes the transition of a source image in an embedding space. Second, the Sweeping emphasizes query-related components of fashion images in the embedding space. We utilize a channel-wise gating mechanism to make it possible. Our single model outperforms previous state-of-the-art image-text composition models including TIRG and FiLM. We participate in the first fashion-IQ challenge in ICCV 2019, for which ensemble of our model achieves one of the best performances.
Youngjae Yu, Seunghwan Lee, Yuncheol Choi, Gunhee Kim• 2020
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Composed Image Retrieval | FashionIQ (val) | Shirt Recall@1021.45 | 455 | |
| Composed Image Retrieval | Fashion-IQ (test) | Dress Recall@100.2615 | 145 | |
| Image-Text Retrieval | Fashion-IQ (test) | Avg Recall@(10, 50)46.8 | 10 |
Showing 3 of 3 rows