SDPose: Exploiting Diffusion Priors for Out-of-Domain and Robust Pose Estimation
About
Pre-trained diffusion models provide rich latent features across U-Net levels and are emerging as powerful vision backbones. While prior works such as Marigold and Lotus repurpose diffusion priors for dense geometric perception tasks such as depth and surface normal estimation, their potential for cross-domain human pose estimation remains largely unexplored. Through a systematic analysis of latent features from different upsampling levels of the Stable Diffusion U-Net, we identify the levels that deliver the strongest robustness and cross-domain generalization for pose estimation. Building on these findings, we propose \textbf{SDPose}, which (i) extracts U-Net features from the selected upsampling blocks, (ii) fuses them with a lightweight feature aggregation module to form a robust representation, and (iii) jointly optimizes keypoint heatmap supervision with an auxiliary latent reconstruction loss to regularize training and preserve the pre-trained generative prior. To evaluate cross-domain generalization and robustness, we construct COCO-OOD, a COCO-based benchmark with four subsets: three style-transferred splits to assess domain shift, and one corruption split (noise, weather, digital artifacts, and blur) to test robustness. With a shorter fine-tuning schedule, SDPose achieves performance comparable to Sapiens on COCO, surpasses Sapiens-1B on COCO-WholeBody, and establishes new state-of-the-art results on HumanArt and COCO-OOD.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Human Pose Estimation | COCO (val) | AP81.2 | 57 | |
| Whole-body Pose Estimation | COCO-WholeBody (val) | Whole AP72.8 | 25 | |
| Human Keypoint Estimation | Human-Art (val) | AP71.8 | 19 | |
| Body Pose Estimation | HumanART | AP71.8 | 4 | |
| Whole-body Pose Estimation | COCO-OOD-Monet Wholebody (val) | Body AP61.3 | 4 | |
| Wholebody Pose Estimation | COCO-OOD-Monet (val) | Left Hand AP48.8 | 4 | |
| Body Pose Estimation | COCO-OOD Monet (Body) | AP64 | 3 | |
| Body Pose Estimation | COCO-OOD Ukiyo-e (Body) | AP66.1 | 3 | |
| Wholebody Pose Estimation | COCO-OOD Ukiyo-e Wholebody | AP50 | 3 | |
| Wholebody Pose Estimation | COCO-OOD Corruption Wholebody | AP54.3 | 3 |