SAMannot: A Memory-Efficient, Local, Open-source Framework for Interactive Video Instance Segmentation based on SAM2

About

Current research workflows for precise video segmentation are often forced into a compromise between labor-intensive manual curation, costly commercial platforms, and/or privacy-compromising cloud-based services. The demand for high-fidelity video instance segmentation in research is often hindered by the bottleneck of manual annotation and the privacy concerns of cloud-based tools. We present SAMannot, an open-source, local framework that integrates the Segment Anything Model 2 (SAM2) into a human-in-the-loop workflow. To address the high resource requirements of foundation models, we modified the SAM2 dependency and implemented a processing layer that minimizes computational overhead and maximizes throughput, ensuring a highly responsive user interface. Key features include persistent instance identity management, an automated ``lock-and-refine'' workflow with barrier frames, and a mask-skeletonization-based auto-prompting mechanism. SAMannot facilitates the generation of research-ready datasets in YOLO and PNG formats alongside structured interaction logs. Verified through animal behavior tracking use-cases and subsets of the LVOS and DAVIS benchmark datasets, the tool provides a scalable, private, and cost-effective alternative to commercial platforms for complex video annotation tasks.

Gergely Dinya, Andr\'as Gelencs\'er, Krisztina Kup\'an, Clemens K\"upper, Krist\'of Karacs, Anna Gelencs\'er-Horv\'ath• 2026

Related benchmarks

Task	Dataset	Result	Rank
Video Instance Segmentation	DAVIS 480p 2017 (train-val)	Mean IoU0.9807		23
Video Instance Segmentation	LVOS subset (test)	Mean IoU95.85		6

Showing 2 of 2 rows

Other info

Follow for update

@wizwand_team Discord