SensorLLM: Aligning Large Language Models with Motion Sensors for Human Activity Recognition
About
We introduce SensorLLM, a two-stage framework that enables Large Language Models (LLMs) to perform human activity recognition (HAR) from sensor time-series data. Despite their strong reasoning and generalization capabilities, LLMs remain underutilized for motion sensor data due to the lack of semantic context in time-series, computational constraints, and challenges in processing numerical inputs. SensorLLM addresses these limitations through a Sensor-Language Alignment stage, where the model aligns sensor inputs with trend descriptions. Special tokens are introduced to mark channel boundaries. This alignment enables LLMs to capture numerical variations, channel-specific features, and data of varying durations, without requiring human annotations. In the subsequent Task-Aware Tuning stage, we refine the model for HAR classification, achieving performance that matches or surpasses state-of-the-art methods. Our results demonstrate that SensorLLM evolves into an effective sensor learner, reasoner, and classifier through human-intuitive Sensor-Language Alignment, generalizing across diverse HAR datasets. We believe this work establishes a foundation for future research on time-series and text alignment, paving the way for foundation models in sensor data analysis. Our codes are available at https://github.com/zechenli03/SensorLLM.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Activity Recognition | UCI-HAR | Accuracy90.8 | 86 | |
| Human Activity Recognition | PAMAP2 | Accuracy87.2 | 54 | |
| Activity Recognition | mHealth | F1 Score89.4 | 35 | |
| Human Activity Recognition | USC-HAD | Macro F161.2 | 24 | |
| Action Captioning | XRF IMU v2 (test) | BLEU@472.3 | 16 | |
| Action Captioning | UWash (test) | B@40.828 | 16 | |
| Action Captioning | XRF Wi-Fi v2 (test) | BLEU@40.392 | 15 | |
| Action Captioning | WiFiTAD (test) | B@444.2 | 15 |