Learning a Discriminative Model for the Perception of Realism in Composite Images
About
What makes an image appear realistic? In this work, we are answering this question from a data-driven perspective by learning the perception of visual realism directly from large amounts of data. In particular, we train a Convolutional Neural Network (CNN) model that distinguishes natural photographs from automatically generated composite images. The model learns to predict visual realism of a scene in terms of color, lighting and texture compatibility, without any human annotations pertaining to it. Our model outperforms previous works that rely on hand-crafted heuristics, for the task of classifying realistic vs. unrealistic photos. Furthermore, we apply our learned model to compute optimal parameters of a compositing method, to maximize the visual realism score predicted by our CNN model. We demonstrate its advantage against existing methods via a human perception study.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Harmonization | iHarmony4 HFlickr | MSE315.4 | 58 | |
| Image Harmonization | iHarmony4 (all) | MSE204.8 | 53 | |
| Image Harmonization | iHarmony4 Hday2night | MSE136.7 | 51 | |
| Image Harmonization | iHarmony4 HAdobe5k | MSE414.3 | 43 | |
| Image Harmonization | S-Adobe5K (test) | MSE41.29 | 25 | |
| Image Harmonization | iHarmony4 HCOCO | MSE79.82 | 20 | |
| Image Harmonization | 99 real composite images (test) | B-T Score0.337 | 12 | |
| Image Harmonization | iHarmony4 0%-5% foreground ratio | MSE33.3 | 12 | |
| Image Harmonization | iHarmony4 5%-15% foreground ratio | MSE145.1 | 12 | |
| Image Harmonization | iHarmony4 15%-100% foreground ratio | MSE682.7 | 12 |