Partially Does It: Towards Scene-Level FG-SBIR with Partial Input
About
We scrutinise an important observation plaguing scene-level sketch research -- that a significant portion of scene sketches are "partial". A quick pilot study reveals: (i) a scene sketch does not necessarily contain all objects in the corresponding photo, due to the subjective holistic interpretation of scenes, (ii) there exists significant empty (white) regions as a result of object-level abstraction, and as a result, (iii) existing scene-level fine-grained sketch-based image retrieval methods collapse as scene sketches become more partial. To solve this "partial" problem, we advocate for a simple set-based approach using optimal transport (OT) to model cross-modal region associativity in a partially-aware fashion. Importantly, we improve upon OT to further account for holistic partialness by comparing intra-modal adjacency matrices. Our proposed method is not only robust to partial scene-sketches but also yields state-of-the-art performance on existing datasets.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Scene-level Fine-Grained SBIR | SketchyCOCO Complete Sketch | Top-1 Accuracy0.345 | 10 | |
| Object-level Fine-Grained SBIR | QMUL-Shoe Complete Sketch V2 | Top-1 Accuracy39.9 | 9 | |
| Scene-level Fine-Grained SBIR | SketchyScene Complete Sketch original | Acc.@135.7 | 9 | |
| Scene-level Fine-Grained SBIR | SketchyScene Pmask 0.3 partial | Top-1 Acc20.6 | 9 | |
| Scene-level Fine-Grained SBIR | SketchyScene Pmask 0.5 partial | Acc.@110.6 | 9 |