| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Audio Deepfake Detection | In-the-wild | EER0.8 | 64 | |
| Illumination Estimation | in-the-wild Samsung | Mean Angular Error2.07 | 13 | |
| Illumination Estimation | in-the-wild Pixel | Mean Angular Error (°)1.89 | 13 | |
| Face retouching | In-the-wild 1,000 internet portraits (test) | NIQE11.821 | 11 | |
| Pixel-level Forgery Localization | In-the-wild | F1 Score69.18 | 11 | |
| Image-level forgery detection | In-the-wild | F1 Score100 | 11 | |
| Bounding Box Localization | In-the-wild | BBox IoU63.54 | 10 | |
| Ordinal consistency | In-the-wild 100 steps horizon v1 (test) | Kendall's Tau0.61 | 8 | |
| Ordinal consistency | In-the-wild 50 steps horizon v1 (test) | Kendall's Tau0.69 | 8 | |
| Voice Anti-spoofing | In-the-Wild (test) | EER6.71 | 7 | |
| Face Reenactment | in the wild | AU %82.3 | 7 | |
| 3D Avatar Reconstruction | In-the-Wild (test) | L1 Loss0.008 | 6 | |
| Synthetic Speech Detection | In-The-Wild (ITW) | EER15.39 | 6 | |
| Novel View Synthesis | In-the-Wild Composite | PSNR26.94 | 6 | |
| Novel View Synthesis | In-the-wild data | PSNR29.26 | 6 | |
| Dynamic Scene Generation | In-the-wild dataset | PhysReal0.82 | 5 | |
| Music-driven 2D dance generation | In-the-Wild leakage-free (test) | FID45.2 | 5 | |
| 6D Object Tracking | In the wild instructional and egocentric videos | Relative Depth0.08 | 5 | |
| Face Tracking and Reconstruction | in-the-wild (test) | L2 Distance12.59 | 5 | |
| 3D Human Reconstruction | in-the-wild | CLIP-I Score0.972 | 4 | |
| Virtual Try-On | In-The-Wild Dataset | Texture Score96.73 | 4 | |
| HOI Video Generation | in-the-wild dataset (test) | Fréchet Video Distance (FVD)484 | 4 | |
| Image-to-image relighting | In-the-wild Stage-wise Study | Lighting Alignment75 | 4 | |
| Image-to-image relighting | In-the-wild Comparison Study | Lighting Alignment0.931 | 3 | |
| 3D Human Reconstruction | In-the-wild Fashion images | Preference Rate (vs ECON)0.551 | 3 |