Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Look Beyond Saliency: Low-Attention Guided Dual Encoding for Video Semantic Search

About

Video semantic search in densely crowded scenes remains a challenging task due to visual encoders tendency to prioritize salient foreground regions while neglecting contextually important, background areas. We propose an Inverse Attention Embedding mechanism that explicitly captures and highlights these overlooked regions. By combining inverse attention embeddings with traditional visual embeddings, our method significantly enhances semantic retrieval performance without additional training. Initial experiments and ablation studies demonstrate promising improvements over existing approaches in recall for video semantic search in crowded environments.

Faisal Aljehrai, Mohammed A. Alkhrashi, Alreem Almuhrij, Sarah Abuhimed, Noorh Aldossary, Abdullah Aldwyish, Raied Aljadaany, Huda Alamri, Muhammad Kamran J Khan• 2026

Related benchmarks

TaskDatasetResultRank
Semantic SearchHoly Mosque Dataset
Recall@541
5
Semantic SearchDense-Set
Recall@542.5
5
Semantic SearchMS-COCO (test)
R@566.3
5
Showing 3 of 3 rows

Other info

Follow for update