Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Token Bottleneck: One Token to Remember Dynamics

About

Deriving compact and temporally aware visual representations from dynamic scenes is essential for successful execution of sequential scene understanding tasks such as visual tracking and robotic manipulation. In this paper, we introduce Token Bottleneck (ToBo), a simple yet intuitive self-supervised learning pipeline that squeezes a scene into a bottleneck token and predicts the subsequent scene using minimal patches as hints. The ToBo pipeline facilitates the learning of sequential scene representations by conservatively encoding the reference scene into a compact bottleneck token during the squeeze step. In the reconstruction step, we guide the model to capture temporal dynamics by predicting the target scene using the bottleneck token along with few target patches as hints. This design encourages the vision backbone to embed temporal dependencies, thereby enabling understanding of dynamic transitions across scenes. Extensive experiments in diverse sequential tasks, including video label propagation and robot manipulation in simulated environments demonstrate the superiority of \ours~over baselines. Moreover, deploying our pre-trained model on physical robots confirms its robustness and effectiveness in real-world environments. We further validate the scalability of ToBo across different model scales. Code is available at https://github.com/naver-ai/tobo.

Taekyung Kim, Dongyoon Han, Byeongho Heo, Jeongeun Park, Sangdoo Yun• 2025

Related benchmarks

TaskDatasetResultRank
Video Object SegmentationDAVIS
J Mean58.4
66
Robotic ManipulationFranka-Kitchen
Avg Success Rate68
39
Video Part SegmentationVIP
mIoU0.34
36
Robot LearningCortexBench
Adroit Score60.4
22
Robot ManipulationMetaWorld
Score87.8
9
Robot ManipulationFranka-Kitchen
Light On Success82
9
Pose TrackingJHMDB
PCK@0.147
8
Robot Policy LearningRLBench
Button Success Rate41.2
8
Vision-based robot policy learningFranka-Kitchen
Knob 1 Success Rate57
8
Robot Policy LearningFranka Kitchen online evaluation in simulation
Knob Turn Success58.4
8
Showing 10 of 15 rows

Other info

Follow for update