Gaussian Pixel Codec Avatars: A Hybrid Representation for Efficient Rendering
About
We present Gaussian Pixel Codec Avatars (GPiCA), photorealistic head avatars that can be generated from multi-view images and efficiently rendered on mobile devices. GPiCA utilizes a unique hybrid representation that combines a triangle mesh and anisotropic 3D Gaussians. This combination maximizes memory and rendering efficiency while maintaining a photorealistic appearance. The triangle mesh is highly efficient in representing surface areas like facial skin, while the 3D Gaussians effectively handle non-surface areas such as hair and beard. To this end, we develop a unified differentiable rendering pipeline that treats the mesh as a semi-transparent layer within the volumetric rendering paradigm of 3D Gaussian Splatting. We train neural networks to decode a facial expression code into three components: a 3D face mesh, an RGBA texture, and a set of 3D Gaussians. These components are rendered simultaneously in a unified rendering engine. The networks are trained using multi-view image supervision. Our results demonstrate that GPiCA achieves the realism of purely Gaussian-based avatars while matching the rendering performance of mesh-based avatars.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| 3D Avatar Rendering | Quest 3 Mobile Benchmark (test) | LPIPS0.33 | 4 | |
| Head Avatar Reconstruction | Face Dataset (Subject 1) | MAE7.65 | 4 | |
| Head Avatar Reconstruction | Face Dataset (Subject 2) | MAE5.7 | 4 | |
| Head Avatar Reconstruction | Face Dataset (Subject 3) | MAE6.13 | 4 | |
| Head Avatar Reconstruction | Face Dataset (Subject 4) | MAE6.04 | 4 | |
| Head Avatar Reconstruction | Face Dataset (Subject 5) | MAE6.21 | 4 | |
| Avatar Reconstruction | full body dataset | MAE2.85 | 4 |