Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text

About

We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to translate human motions (as measured by IMU sensors) into their corresponding textual descriptions and videos -- while preserving the transitivity across these modalities. We explore several new IMU-based applications that IMU2CLIP enables, such as motion-based media retrieval and natural language reasoning tasks with motion data. In addition, we show that IMU2CLIP can significantly improve the downstream performance when fine-tuned for each application (e.g. activity recognition), demonstrating the universal usage of IMU2CLIP as a new pre-trained resource. Our code will be made publicly available.

Seungwhan Moon, Andrea Madotto, Zhaojiang Lin, Alireza Dirafzoon, Aparajita Saraf, Amy Bearman, Babak Damavandi• 2022

Related benchmarks

TaskDatasetResultRank
Activity RecognitionPAMAP2
Accuracy1.9
22
Activity RecognitionmHealth
F1 Score1.3
17
Human Activity RecognitionTotalCapture
Accuracy68
16
Human Activity RecognitionMRI
Accuracy85
16
Activity RecognitionUTD-MHAD
Accuracy3.7
9
Activity RecognitionMMAct
Acc6.4
9
Activity RecognitionUCI-HAR
Accuracy17.8
9
Activity RecognitionUSC-HAD
Accuracy12.8
9
Activity RecognitionMotionSense
Accuracy15.5
9
Activity RecognitionShoaib
Accuracy16.7
9
Showing 10 of 32 rows

Other info

Follow for update