Hand-Object Interaction Image Generation
About
In this work, we are dedicated to a new task, i.e., hand-object interaction image generation, which aims to conditionally generate the hand-object image under the given hand, object and their interaction status. This task is challenging and research-worthy in many potential application scenarios, such as AR/VR games and online shopping, etc. To address this problem, we propose a novel HOGAN framework, which utilizes the expressive model-aware hand-object representation and leverages its inherent topology to build the unified surface space. In this space, we explicitly consider the complex self- and mutual occlusion during interaction. During final image synthesis, we consider different characteristics of hand and object and generate the target image in a split-and-combine manner. For evaluation, we build a comprehensive protocol to access both the fidelity and structure preservation of the generated image. Extensive experiments on two large-scale datasets, i.e., HO3Dv3 and DexYCB, demonstrate the effectiveness and superiority of our framework both quantitatively and qualitatively. The project page is available at https://play-with-hoi-generation.github.io/.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Hand Reconstruction | HO3D v3 | PA-MPJPE11.8 | 18 | |
| Hand-object interaction image generation | DexYCB | FID30.1 | 7 | |
| Hand-object manipulation video generation | DexYCB | FID64.74 | 4 | |
| Object Structure Preservation | HO3D v3 | Sugar Box Preservation Score52.1 | 4 | |
| Hand-object interaction image generation | HO3D v3 | FID41.3 | 3 |