Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Hadamard Product for Low-rank Bilinear Pooling

About

Bilinear models provide rich representations compared with linear models. They have been applied in various visual tasks, such as object recognition, segmentation, and visual question-answering, to get state-of-the-art performances taking advantage of the expanded representations. However, bilinear representations tend to be high-dimensional, limiting the applicability to computationally complex tasks. We propose low-rank bilinear pooling using Hadamard product for an efficient attention mechanism of multimodal learning. We show that our model outperforms compact bilinear pooling in visual question-answering tasks with the state-of-the-art results on the VQA dataset, having a better parsimonious property.

Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang• 2016

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2 (test-dev)
Overall Accuracy66.27
664
Visual Question AnsweringVQA v2 (test-std)
Accuracy66.62
466
Video Question AnsweringMSRVTT-QA (test)
Accuracy76.1
371
Visual Question AnsweringVQA 2.0 (test-dev)
Accuracy66.27
337
Visual Question AnsweringVQA (test-dev)
Acc (All)66.77
147
Visual Question AnsweringVQA (test-std)--
110
Open-Ended Visual Question AnsweringVQA 1.0 (test-dev)
Overall Accuracy64.89
100
Visual Question AnsweringVQA (val)
Overall Accuracy57.91
55
Open-Ended Visual Question AnsweringVQA 1.0 (test-standard)
Overall Accuracy66.89
50
Visual Question AnswerVQA 1.0 (test-dev)
Overall Accuracy66.77
44
Showing 10 of 22 rows

Other info

Follow for update