Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

TPCL: Task Progressive Curriculum Learning for Robust Visual Question Answering

About

Visual Question Answering (VQA) systems are notoriously brittle under distribution shifts and data scarcity. While previous solutions-such as ensemble methods and data augmentation-can improve performance in isolation, they fail to generalise well across in-distribution (IID), out-of-distribution (OOD), and low-data settings simultaneously. We argue that this limitation stems from the suboptimal training strategies employed. Specifically, treating all training samples uniformly-without accounting for question difficulty or semantic structure-leaves the models vulnerable to dataset biases. Thus, they struggle to generalise beyond the training distribution. To address this issue, we introduce Task-Progressive Curriculum Learning (TPCL)-a simple, model-agnostic framework that progressively trains VQA models using a curriculum built by jointly considering question type and difficulty. Specifically, TPCL first groups questions based on their semantic type (e.g., yes/no, counting) and then orders them using a novel Optimal Transport-based difficulty measure. Without relying on data augmentation or explicit debiasing, TPCL improves generalisation across IID, OOD, and low-data regimes and achieves state-of-the-art performance on VQA-CP v2, VQA-CP v1, and VQA v2. It outperforms the most competitive robust VQA baselines by over 5% and 7% on VQA-CP v2 and v1, respectively, and boosts backbone performance by up to 28.5%.

Ahmed Akl, Abdelwahed Khamis, Zhe Wang, Ali Cheraghian, Sara Khalifa, Kewen Wang• 2024

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVQA v2 (val)
Accuracy78.42
144
Visual Question AnsweringVQA-CP v2 (test)
Overall Accuracy77.23
128
Visual Question AnsweringVQA-CP v1 (test)
Accuracy (Overall)76.78
33
Visual Question AnsweringVQA-CP v2
Overall Accuracy77.23
16
Visual Question AnsweringVQA v2
Overall Accuracy78.42
15
Showing 5 of 5 rows

Other info

Follow for update