Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MSGL-Transformer: A Multi-Scale Global-Local Transformer for Rodent Social Behavior Recognition

About

Recognition of rodent behavior is important for understanding neural and behavioral mechanisms. Traditional manual scoring is time-consuming and prone to human error. We propose MSGL-Transformer, a Multi-Scale Global-Local Transformer for recognizing rodent social behaviors from pose-based temporal sequences. The model employs a lightweight transformer encoder with multi-scale attention to capture motion dynamics across different temporal scales. The architecture integrates parallel short-range, medium-range, and global attention branches to explicitly capture behavior dynamics at multiple temporal scales. We also introduce a Behavior-Aware Modulation (BAM) block, inspired by SE-Networks, which modulates temporal embeddings to emphasize behavior-relevant features prior to attention. We evaluate on two datasets: RatSI (5 behavior classes, 12D pose inputs) and CalMS21 (4 behavior classes, 28D pose inputs). On RatSI, MSGL-Transformer achieves 75.4% mean accuracy and F1-score of 0.745 across nine cross-validation splits, outperforming TCN, LSTM, and Bi-LSTM. On CalMS21, it achieves 87.1% accuracy and F1-score of 0.8745, a +10.7% improvement over HSTWFormer, and outperforms ST-GCN, MS-G3D, CTR-GCN, and STGAT. The same architecture generalizes across both datasets with only input dimensionality and number of classes adjusted.

Muhammad Imran Sharif, Doina Caragea• 2026

Related benchmarks

TaskDatasetResultRank
Social interaction classificationRatSI (val-8-test-2)
Accuracy81.48
8
Behavior RecognitionCalMS21 Task 1 (test)
Avg Per-Class Accuracy87.1
6
Behavior ClassificationCalMS21 Task 1 (test)
Accuracy87.09
4
Social Behavior ClassificationCalMS21 Task 1 (test)
Accuracy87.09
4
Showing 4 of 4 rows

Other info

Follow for update