Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications

About

Egocentric videos offer fine-grained information for high-fidelity modeling of human behaviors. Hands and interacting objects are one crucial aspect of understanding a viewer's behaviors and intentions. We provide a labeled dataset consisting of 11,243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with during a diverse array of daily activities. Our dataset is the first to label detailed hand-object contact boundaries. We introduce a context-aware compositional data augmentation technique to adapt to out-of-distribution YouTube egocentric video. We show that our robust hand-object segmentation model and dataset can serve as a foundational tool to boost or enable several downstream vision applications, including hand state classification, video activity recognition, 3D mesh reconstruction of hand-object interactions, and video inpainting of hand-object foregrounds in egocentric videos. Dataset and code are available at: https://github.com/owenzlz/EgoHOS

Lingzhi Zhang, Shenghao Zhou, Simon Stent, Jianbo Shi• 2022

Related benchmarks

Task	Dataset	Result
Semantic segmentation	EgoHOS in-domain (test)	Left Hand IoU90.38	13
Egocentric Hand-Object Segmentation	EgoHOS out-of-domain (test)	Left Hand IoU81.77	11
Egocentric Hand-Object Segmentation	mini-HOI4D out-of-distribution (test)	IoU (Left Hand)8.74	11
Hand-object segmentation	EgoHOS out-of-domain (test)	Left Hand Accuracy0.8783	10
Egocentric Referring Video Object Segmentation	VISOR (val)	mIoU55.1	10
Hand-object segmentation	HOI4D mini	Left Hand Accuracy40.9	10
Egocentric Referring Video Object Segmentation	VSCOS (test)	mIoU42.1	4
Egocentric Referring Video Object Segmentation	VOST (test)	mIoU21.9	4
Referring Video Object Segmentation	VISOR novel	mIoU45.8	4

Showing 9 of 9 rows

Other info

Follow for update

@wizwand_team Discord