Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Concepts in Motion: Temporal Concept Bottleneck Model for Interpretable Video Classification

About

Concept Bottleneck Models (CBMs) enable interpretable image classification by structuring predictions around human-understandable concepts, but extending this paradigm to video remains challenging due to the difficulty of extracting concepts and modeling them over time. In this paper, we introduce MoTIF (Moving Temporal Interpretable Framework), a transformer-based concept architecture that operates on sequences of temporally grounded concept activations, by employing per-concept temporal self-attention to model when individual concepts recur and how their temporal patterns contribute to predictions. Central to the framework is a class-conditioned VLM-based concept discovery module that extracts object- and action-centric textual concepts from training videos, yielding temporally expressive concept sets without manual concept annotation. Across multiple video benchmarks, this combination improves over global concept bottlenecks and remains competitive within the interpretable concept-bottleneck setting, while narrowing the gap to strong black-box video baselines that we report as contextual references. Code available at github.com/patrick-knab/MoTIF.

Patrick Knab, Sascha Marton, Philipp J. Schubert, Drago Guggiana, Christian Bartelt• 2025

Related benchmarks

TaskDatasetResultRank
Video Action RecognitionUCF101
Top-1 Acc98.4
165
Video Action RecognitionHMDB51
Top-1 Accuracy83
130
Action RecognitionUCF101 (train-test)
Top-1 Accuracy98.4
27
Video Action RecognitionSS v2
Top-1 Accuracy (SS v2)41.9
26
Action RecognitionHMDB51 (train-test)
Top-1 Accuracy83
21
Action RecognitionSSv2 (train-test)
Top-1 Accuracy41.9
21
Action RecognitionBreakfast Actions (train-test)
Top-1 Acc87.5
20
Video Action RecognitionBreakfast
Top-1 Accuracy87.5
18
Action RecognitionHAA-100 (train-test)
Top-1 Acc89.9
6
Action RecognitionHAA-500 (train-test)
Top-1 Accuracy84.1
3
Showing 10 of 10 rows

Other info

Follow for update