Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

About

We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near state-of-the-art results. Our experimental results suggest that UViM is a promising candidate for a unified modeling approach in computer vision.

Alexander Kolesnikov, Andr\'e Susano Pinto, Lucas Beyer, Xiaohua Zhai, Jeremiah Harmsen, Neil Houlsby• 2022

Related benchmarks

TaskDatasetResultRank
Semantic segmentationADE20K (val)
mIoU43.71
2731
Semantic segmentationADE20K
mIoU49.9
936
Depth EstimationNYU v2 (test)--
423
Depth EstimationNYU Depth V2
RMSE0.467
177
Panoptic SegmentationCOCO 2017 (val)
PQ45.8
172
Depth EstimationNYU v2 (val)
RMSE0.467
53
Panoptic SegmentationCOCO
PQ45.8
23
Monocular Depth EstimationNYUv2 37
RMSE0.467
18
Generic SegmentationCOCO
PQ45.8
14
Image ColorizationImageNet (val)--
9
Showing 10 of 10 rows

Other info

Code

Follow for update