Perceiver IO: A General Architecture for Structured Inputs & Outputs
About
A central goal of machine learning is the development of systems that can solve many problems in as many data domains as possible. Current architectures, however, cannot be applied beyond a small set of stereotyped settings, as they bake in domain & task assumptions or scale poorly to large inputs or outputs. In this work, we propose Perceiver IO, a general-purpose architecture that handles data from arbitrary settings while scaling linearly with the size of inputs and outputs. Our model augments the Perceiver with a flexible querying mechanism that enables outputs of various sizes and semantics, doing away with the need for task-specific architecture engineering. The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and StarCraft II. As highlights, Perceiver IO outperforms a Transformer-based BERT baseline on the GLUE language benchmark despite removing input tokenization and achieves state-of-the-art performance on Sintel optical flow estimation with no explicit mechanisms for multiscale correspondence.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Image Classification | ImageNet-1k (val) | -- | 1469 | |
| Natural Language Understanding | GLUE (dev) | SST-2 (Acc)89.9 | 518 | |
| Optical Flow Estimation | KITTI 2015 (train) | Fl-epe4.98 | 446 | |
| Natural Language Understanding | GLUE (test) | -- | 416 | |
| Optical Flow | Sintel (train) | AEPE (Clean)1.81 | 200 | |
| Optical Flow | KITTI 2015 (test) | -- | 109 | |
| Optical Flow | Sintel Final (train) | EPE2.42 | 106 | |
| Optical Flow | Sintel Clean (train) | EPE1.81 | 98 | |
| Robotic Manipulation | RLBench | Avg Success Score0.494 | 62 | |
| Optical Flow | Sintel Final | EPE2.42 | 59 |