Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding

About

Despite advancements in multimodal large language models (MLLMs), current approaches struggle in medium-to-long video understanding due to frame and context length limitations. As a result, these models often depend on frame sampling, which risks missing key information over time and lacks task-specific relevance. To address these challenges, we introduce HierarQ, a task-aware hierarchical Q-Former based framework that sequentially processes frames to bypass the need for frame sampling, while avoiding LLM's context length limitations. We introduce a lightweight two-stream language-guided feature modulator to incorporate task awareness in video understanding, with the entity stream capturing frame-level object information within a short context and the scene stream identifying their broader interactions over longer period of time. Each stream is supported by dedicated memory banks which enables our proposed Hierachical Querying transformer (HierarQ) to effectively capture short and long-term context. Extensive evaluations on 10 video benchmarks across video understanding, question answering, and captioning tasks demonstrate HierarQ's state-of-the-art performance across most datasets, proving its robustness and efficiency for comprehensive video analysis.

Shehreen Azad, Vibhav Vineet, Yogesh Singh Rawat• 2025

Related benchmarks

TaskDatasetResultRank
Video Question AnsweringMSRVTT-QA--
481
Video Question AnsweringMSVD-QA--
340
Video Question AnsweringActivityNet-QA--
319
Video CaptioningMSVD (test)
CIDEr183.1
111
Video CaptioningYouCook2
METEOR18.1
104
Video CaptioningMSRVTT--
101
Video CaptioningYouCook II (val)
CIDEr136.1
98
Video CaptioningMSRVTT (test)
CIDEr80.5
61
Long-form Video UnderstandingLVU
Relation Attribute Accuracy69.4
44
Long Video Question AnsweringMovieChat-1K Global Mode (test)
Accuracy87.5
24
Showing 10 of 15 rows

Other info

Follow for update