Hadamard Product for Low-rank Bilinear Pooling
About
Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.
Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang• 2016
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Visual Question Answering | VQA v2 (test-dev) | Overall Accuracy66.27 | 664 | |
| Visual Question Answering | VQA v2 (test-std) | Accuracy66.62 | 466 | |
| Video Question Answering | MSRVTT-QA (test) | Accuracy76.1 | 371 | |
| Visual Question Answering | VQA 2.0 (test-dev) | Accuracy66.27 | 337 | |
| Visual Question Answering | VQA (test-dev) | Acc (All)66.77 | 147 | |
| Visual Question Answering | VQA (test-std) | -- | 110 | |
| Open-Ended Visual Question Answering | VQA 1.0 (test-dev) | Overall Accuracy64.89 | 100 | |
| Visual Question Answering | VQA (val) | Overall Accuracy57.91 | 55 | |
| Open-Ended Visual Question Answering | VQA 1.0 (test-standard) | Overall Accuracy66.89 | 50 | |
| Visual Question Answer | VQA 1.0 (test-dev) | Overall Accuracy66.77 | 44 |
Showing 10 of 22 rows