Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Follow Anything: Open-set detection, tracking, and following in real-time

About

Tracking and following objects of interest is critical to several robotics use cases, ranging from industrial automation to logistics and warehousing, to healthcare and security. In this paper, we present a robotic system to detect, track, and follow any object in real-time. Our approach, dubbed ``follow anything'' (FAn), is an open-vocabulary and multimodal model -- it is not restricted to concepts seen at training time and can be applied to novel classes at inference time using text, images, or click queries. Leveraging rich visual descriptors from large-scale pre-trained models (foundation models), FAn can detect and segment objects by matching multimodal queries (text, images, clicks) against an input image sequence. These detected and segmented objects are tracked across image frames, all while accounting for occlusion and object re-emergence. We demonstrate FAn on a real-world robotic system (a micro aerial vehicle) and report its ability to seamlessly follow the objects of interest in a real-time control loop. FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second. To enable rapid adoption, deployment, and extensibility, we open-source all our code on our project webpage at https://github.com/alaamaalouf/FollowAnything . We also encourage the reader to watch our 5-minutes explainer video in this https://www.youtube.com/watch?v=6Mgt3EPytrw .

Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallabhula, Makram Chahine, Daniel M.Vogt, Robert J. Wood, Antonio Torralba, Daniela Rus• 2023

Related benchmarks

TaskDatasetResultRank
Visual Active TrackingUnrealCV Parking Lot scene
EL301
21
Embodied Visual TrackingUrbanCity Unseen Virtual Environment
EL466
16
Embodied Visual TrackingSimpleRoom Unseen Virtual Environment
EL500
16
Visual Active TrackingUnrealCV UrbanRoad scene
EL409
11
Visual Active TrackingUnrealCV
EL462
11
Visual Active TrackingUnrealCV Snow Village scene
EL456
11
Visual Active TrackingUnrealCV UrbanCity 4D
EL334
10
Visual Active TrackingUnrealCV ComplexRoom 4D
EL351
10
Visual Active TrackingUnrealCV Average - Distractor Environments
EL329
10
Action PredictionVOT 2021 (8 selected videos)
Average Correct Action Rate72
6
Showing 10 of 13 rows

Other info

Follow for update