| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Low-shot recognition | Toys4K (test) | Accuracy84.13 | 72 | |
| Text-to-3D | Toys4k | CLIP Score35.72 | 25 | |
| 3D Generation | Toys4K | CLIP Score92.97 | 16 | |
| Mesh Reconstruction | Toys4K | Chamfer Distance0.033 | 16 | |
| Low-shot recognition | Toys4k multi-object setting (test) | LSA60.49 | 15 | |
| 3D Reconstruction | Toys4k (Preserved Part) | Appearance PSNR25.46 | 14 | |
| 3D Inpainting | Toys4k Inpainting Part | CLIP Score30.61 | 14 | |
| Refining VFM-derived artifacts | Toys4k | mIoU44.6 | 13 | |
| Text-to-3D Generation | Toys4K CL (Base) | CLIP Similarity29.6 | 12 | |
| 3D Geometry Synthesis | Toys4K (test) | Throughput (iter/s)0.6426 | 12 | |
| Image-to-3D | Toys4k | FD (Inception)6.216 | 11 | |
| Text-to-3D Generation | Toys4K-CL Forgetting | CLIP Similarity17.36 | 10 | |
| Text-to-3D Generation | Toys4K CL (All) | CLIP Similarity29.51 | 10 | |
| Text-to-3D Generation | Toys4K CL (Novel) | CLIP Similarity29.86 | 10 | |
| Multi-object Category Recognition (Categ-MObj) | Toys4k multi-object setting | LSA60.49 | 10 | |
| Shape-conditioned 3D object generation (geometric primitives) | Toys4K generalization (test) | CD4.89 | 9 | |
| Single-view 3D Reconstruction | Toys4k (test) | PSNR28.144 | 8 | |
| Appearance Reconstruction | Toys4k Preserved Part | PSNR22.8 | 7 | |
| Mesh Generation | Toys4k (Artist Meshes) | Chamfer Distance (CD)0.038 | 7 | |
| Mesh Reconstruction | Toys4k Artist Meshes (test) | Chamfer Distance (CD)0.038 | 7 | |
| Low-shot object recognition | Toys4k Inst-SObj | Accuracy96.5 | 6 | |
| Low-shot recognition | Toys4k Categ-SObj 1.0 (test) | Accuracy79.69 | 6 | |
| Low-shot recognition | Toys4k Inst-SObj 1.0 (test) | Accuracy96.5 | 6 | |
| Zero-shot text-to-image retrieval | Toys4K | Recall@17.42 | 5 | |
| Single-view 3D Generation | Toys4K | IoU93.57 | 5 |