Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Cross-view Action Modeling, Learning and Recognition

About

Existing methods on video-based action recognition are generally view-dependent, i.e., performing recognition from the same views seen in the training data. We present a novel multiview spatio-temporal AND-OR graph (MST-AOG) representation for cross-view action recognition, i.e., the recognition is performed on the video from an unknown and unseen view. As a compositional model, MST-AOG compactly represents the hierarchical combinatorial structures of cross-view actions by explicitly modeling the geometry, appearance and motion variations. This paper proposes effective methods to learn the structure and parameters of MST-AOG. The inference based on MST-AOG enables action recognition from novel views. The training of MST-AOG takes advantage of the 3D human skeleton data obtained from Kinect cameras to avoid annotating enormous multi-view video frames, which is error-prone and time-consuming, but the recognition does not need 3D information and is based on 2D video input. A new Multiview Action3D dataset has been created and will be released. Extensive experiments have demonstrated that this new action representation significantly improves the accuracy and robustness for cross-view action recognition on 2D videos.

Jiang wang, Xiaohan Nie, Yin Xia, Ying Wu, Song-Chun Zhu• 2014

Related benchmarks

TaskDatasetResultRank
Action RecognitionNorthwestern-UCLA (NUCLA) Multiview (cross-view)
Mean Accuracy45.2
45
Action RecognitionPenn-Action (test)
Accuracy74
27
Action RecognitionUWA3D Multiview-II (V1,2^4)
Accuracy39.7
20
Action RecognitionUWA3D Multiview-II (V3,4^2)
Accuracy44.2
20
Action RecognitionUWA3D Multiview-II (V1,2^3)
Accuracy47.3
20
Action RecognitionUWA3D Multiview-II V1,4^3
Accuracy0.422
20
Action RecognitionUWA3D Multiview-II (V3,4^1)
Accuracy51
20
Action RecognitionUWA3D Multiview-II (V2,3^4)
Accuracy43.2
20
Action RecognitionUWA3D Multiview-II
Accuracy42.3
20
Action RecognitionUWA3D Multiview-II (V1,3^2)
Accuracy43
20
Showing 10 of 16 rows

Other info

Follow for update