Zero-Shot Multi-Animal Tracking in the Wild
About
Multi-animal tracking is crucial for understanding animal ecology and behavior, yet remains challenging due to variations in habitat, motion patterns, and species appearance. Traditional approaches typically require extensive fine-tuning and heuristic design for each new scenario. In this work, we explore vision foundation models for zero-shot multi-animal tracking. Building on SAM2MOT, we combine Grounding DINO with the Segment Anything Model2 (SAM 2) and introduce three targeted modifications to adapt the framework to animal appearance and behavior without any retraining or hyperparameter tuning between datasets. We also evaluate the recent SAM3 model, but identify practical limitations that restrict its applicability to multi-animal tracking in the wild. Our method achieves state-of-the-art results across Chimp-Act, Bird Flock Tracking, AnimalTrack, and a subset of GMOT-40, demonstrating robust generalization across diverse species and environments. The code is available at https://github.com/ecker-lab/SAM2-Animal-Tracking.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-Object Tracking | AnimalTrack (test) | HOTA58 | 13 | |
| Multi-animal tracking | ChimpACT (test) | HOTA58.6 | 6 | |
| Multi-animal tracking | BFT (test) | HOTA74.8 | 6 | |
| Multi-animal tracking | GMOT-40 Animal (test) | HOTA62.4 | 4 |