Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Summarizing Videos with Attention

About

In this work we propose a novel method for supervised, keyshots based video summarization by applying a conceptually simple and computationally efficient soft, self-attention mechanism. Current state of the art methods leverage bi-directional recurrent networks such as BiLSTM combined with attention. These networks are complex to implement and computationally demanding compared to fully connected networks. To that end we propose a simple, self-attention based network for video summarization which performs the entire sequence to sequence transformation in a single feed forward pass and single backward pass during training. Our method sets a new state of the art results on two benchmarks TvSum and SumMe, commonly used in this domain.

Jiri Fajtl, Hajar Sadeghi Sokeh, Vasileios Argyriou, Dorothy Monekosso, Paolo Remagnino• 2018

Related benchmarks

TaskDatasetResultRank
Video SummarizationTVSum
F-Measure62.4
213
Video SummarizationSumMe
F1 Score (Avg)51.09
130
Video SummarizationTVSum
Kendall's Tau0.082
55
Video SummarizationTVSum (test)
F-score0.614
47
Video SummarizationSumMe (test)
F-score42.5
35
Video SummarizationSumMe
Kendall's τ0.16
32
Video SummarizationSumMe
Kendall's tau0.054
26
Video SummarizationTVSum
Kendall's τ0.16
24
Video highlight detectionMr.HiSum
mAP (rho=50%)80
14
Video SummarizationSumMe
F-Score49.7
13
Showing 10 of 21 rows

Other info

Code

Follow for update