Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Unleash the Potential of CLIP for Video Highlight Detection

About

Multimodal and large language models (LLMs) have revolutionized the utilization of open-world knowledge, unlocking novel potentials across various tasks and applications. Among these domains, the video domain has notably benefited from their capabilities. In this paper, we present Highlight-CLIP (HL-CLIP), a method designed to excel in the video highlight detection task by leveraging the pre-trained knowledge embedded in multimodal models. By simply fine-tuning the multimodal encoder in combination with our innovative saliency pooling technique, we have achieved the state-of-the-art performance in the highlight detection task, the QVHighlight Benchmark, to the best of our knowledge.

Donghoon Han, Seunghyeon Seo, Eunhwan Park, Seong-Uk Nam, Nojun Kwak• 2024

Related benchmarks

TaskDatasetResultRank
Moment RetrievalQVHighlights (test)--
188
Highlight DetectionQVHighlights (test)
HIT@170.6
161
Moment RetrievalQVHighlights (val)--
61
Highlight DetectionQVHighlights (val)
HIT@172.4
45
Showing 4 of 4 rows

Other info

Follow for update