Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

Unsupervised learning of object frames by dense equivariant image labelling

About

One of the key challenges of visual perception is to extract abstract models of 3D objects and object categories from visual measurements, which are affected by complex nuisance factors such as viewpoint, occlusion, motion, and deformations. Starting from the recent idea of viewpoint factorization, we propose a new approach that, given a large number of images of an object and no other supervision, can extract a dense object-centric coordinate frame. This coordinate frame is invariant to deformations of the images and comes with a dense equivariant labelling neural network that can map image pixels to their corresponding object coordinates. We demonstrate the applicability of this method to simple articulated objects and deformable objects such as human faces, learning embeddings from random synthetic transformations or optical flow correspondences, all without any manual supervision.

James Thewlis, Hakan Bilen, Andrea Vedaldi• 2017

Related benchmarks

TaskDatasetResultRank
Landmark PredictionMAFL (test)
Mean Error (%)5.83
38
Landmark RegressionMAFL (test)
MSE (%)5.83
28
Landmark Prediction300-W (test)
Landmark Prediction Error7.97
12
Landmark DetectionMAFL (test)
Inter-ocular Distance Error (%)4.02
10
Landmark Detection300W (test)
Inter-ocular Distance Error8.23
9
Landmark PredictionHuman 3.6M (test)
Error (Mixed Actions)7.51
9
Landmark DetectionAFLW (M)
Inter-ocular Distance Error (%)10.99
7
Landmark DetectionAFLW (R)
Inter-ocular Distance Error (%)10.14
5
Showing 8 of 8 rows

Other info

Follow for update