Visual Personalization Turing Test
About
We introduce the Visual Personalization Turing Test (VPTT), a new paradigm for evaluating contextual visual personalization based on perceptual indistinguishability, rather than identity replication. A model passes the VPTT if its output (image, video, 3D asset, etc.) is indistinguishable to a human or calibrated VLM judge from content a given person might plausibly create or share. To operationalize VPTT, we present the VPTT Framework, integrating a 10k-persona benchmark (VPTT-Bench), a visual retrieval-augmented generator (VPRAG), and the VPTT Score, a text-only metric calibrated against human and VLM judgments. We show high correlation across human, VLM, and VPTT evaluations, validating the VPTT Score as a reliable perceptual proxy. Experiments demonstrate that VPRAG achieves the best alignment-originality balance, offering a scalable and privacy-safe foundation for personalized generative AI.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Contextual Visual Personalization (Editing) | VPTT-Bench 1.0 (test) | VPTTscore (V)0.626 | 30 | |
| Generation | VPTT-Bench 1.0 (test) | VPTT Score (Novelty Adjusted)0.644 | 15 | |
| Visual Personalization | Visual Personalization Evaluation Set | VIPER Proxy Score (PS)97.4 | 12 | |
| Image Generation and Editing | VPTT Human Study (6000 annotations) (test) | VPTTscore-c (Text) Avg0.464 | 4 |