Answering Questions about Data Visualizations using Efficient Bimodal Fusion

About

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e.g. bar charts, pie charts, and line graphs. CQA requires capabilities that natural-image VQA algorithms lack: fine-grained measurements, optical character recognition, and handling out-of-vocabulary words in both questions and answers. Without modifications, state-of-the-art VQA algorithms perform poorly on this task. Here, we propose a novel CQA algorithm called parallel recurrent fusion of image and language (PReFIL). PReFIL first learns bimodal embeddings by fusing question and image features and then intelligently aggregates these learned embeddings to answer the given question. Despite its simplicity, PReFIL greatly surpasses state-of-the art systems and human baselines on both the FigureQA and DVQA datasets. Additionally, we demonstrate that PReFIL can be used to reconstruct tables by asking a series of questions about a chart.

Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan• 2019

Related benchmarks

Task	Dataset	Result
Visual Question Answering	ChartQA (test)	Accuracy4.8	93
Visual Question Answering	ChartQA (val)	Accuracy4.53	25
Chart Question Answering	DVQA novel (test)	Structure99.78	10
Visual Question Answering	FigureQA Alternate color scheme (val)	Accuracy93.26	10
Visual Question Answering	DVQA novel (test)	Accuracy (Oracle)96.53	10
Visual Question Answering	DVQA familiar (test)	Accuracy (Oracle)96.37	10
Chart Question Answering	DVQA (test-familiar)	Structure99.77	9
Figure Visual Question Answering	FigureQA 1.0 (test 2)	Overall Accuracy93.16	9
Visual Question Answering	FigureQA (val1)	Accuracy94.84	9
Visual Question Answering	FigureQA (val2)	Accuracy93.26	9

Showing 10 of 25 rows

Other info

Follow for update

@wizwand_team Discord