PPGuide: Steering Diffusion Policies with Performance Predictive Guidance
About
Diffusion policies have shown to be very efficient at learning complex, multi-modal behaviors for robotic manipulation. However, errors in generated action sequences can compound over time which can potentially lead to failure. Some approaches mitigate this by augmenting datasets with expert demonstrations or learning predictive world models which might be computationally expensive. We introduce Performance Predictive Guidance (PPGuide), a lightweight, classifier-based framework that steers a pre-trained diffusion policy away from failure modes at inference time. PPGuide makes use of a novel self-supervised process: it uses attention-based multiple instance learning to automatically estimate which observation-action chunks from the policy's rollouts are relevant to success or failure. We then train a performance predictor on this self-labeled data. During inference, this predictor provides a real-time gradient to guide the policy toward more robust actions. We validated our proposed PPGuide across a diverse set of tasks from the Robomimic and MimicGen benchmarks, demonstrating consistent improvements in performance.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Coffee Making/Handling | Robomimic MimicGen Coffee (D2) | Success Rate60 | 25 | |
| Mug Cleanup | Robomimic MimicGen Mug Cleanup (D1) | Success Rate36 | 20 | |
| Coffee Preparation | Robomimic/MimicGen Coffee Prep. (D1) | Success Rate24 | 20 | |
| Kitchen manipulation | Robomimic/MimicGen Kitchen (D1) | Success Rate54 | 10 | |
| Object Transport | Robomimic Transport | Success Rate76 | 10 | |
| Square Nut Insertion | Robomimic Square | Success Rate72 | 10 | |
| Stacking | Robomimic MimicGen Stack D1 | Success Rate94 | 10 | |
| Stacking Three Blocks | Robomimic MimicGen Stack Three (D1) | Success Rate36 | 10 | |
| Robot Manipulation | square | Success Rate70 | 8 | |
| Robot Manipulation | Transport | Success Rate74 | 8 |