Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering

About

Visual Question Answering (VQA) requires models to identify the correct answer options based on both visual and textual evidence. Recent Mixture-of-Experts (MoE) methods improve option reasoning by grouping similar concepts or routing based on examples. However, unstable routing can lead to inconsistent expert selection in the same question type, while overly stable routing may reduce flexibility. To address this, we propose Concept-Guided Routing framework (CoGR-MoE), which incorporates semantics of the answer options to guide expert selection in the training phase. Next, option features are used to reweight the selected experts, producing discriminative representations for each candidate option. These option-level representations are further used for option comparison and optimized via contrastive learning. The experimental results indicate that CoGR-MoE delivers strong performance across multiple VQA tasks, demonstrating the effectiveness of our approach.

Xiyin Zeng, Yi Lu, Hao Wang• 2026

Related benchmarks

TaskDatasetResultRank
Visual Question AnsweringVizWiz
Accuracy84.8
1820
Visual Question AnsweringGQA
Accuracy83.2
1425
Visual Question AnsweringScienceQA
Accuracy77.4
446
Visual Question AnsweringVQA v2
Accuracy88.5
333
Multimodal EvaluationMM-Vet--
196
Multimodal EvaluationMMStar
Accuracy52
139
Visual Question AnsweringMRAG-Bench
Overall Accuracy68.96
14
Showing 7 of 7 rows

Other info

Follow for update