Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

DETRPose: Real-time end-to-end transformer model for multi-person pose estimation

About

Multi-person pose estimation (MPPE) estimates keypoints for all individuals present in an image. MPPE is a fundamental task for several applications in computer vision and virtual reality. Unfortunately, there are currently no transformer-based models that can perform MPPE in real time. The paper presents a family of transformer-based models capable of performing multi-person 2D pose estimation in real-time. Our approach utilizes a modified decoder architecture and keypoint similarity metrics to generate both positive and negative queries, thereby enhancing the quality of the selected queries within the architecture. Compared to state-of-the-art models, our proposed models train much faster, using 5 to 10 times fewer epochs, with competitive inference times without requiring quantization libraries to speed up the model. Furthermore, our proposed models provide competitive results or outperform alternative models, often using significantly fewer parameters.

Sebastian Janampa, Marios Pattichis• 2025

Related benchmarks

TaskDatasetResultRank
Pose EstimationCOCO (val)
AP73.3
319
Multi-person Pose EstimationCrowdPose (test)
AP75.1
177
Multi-person Pose EstimationCOCO 2017 (test-dev)
AP72.2
99
Showing 3 of 3 rows

Other info

Code

Follow for update