Sticky-Glance: Robust Intent Recognition for Human Robot Collaboration via Single-Glance
About
Gaze is a valuable means of communication for impaired people with extremely limited motor capabilities. However, robust gaze-based intent recognition in multi-object environments is challenging due to gaze noise, micro-saccades, viewpoint changes, and dynamic objects. To address this, we propose an object-centric gaze grounding framework that stabilizes intent through a sticky-glance algorithm, jointly modeling geometric distance and direction trends. The inferred intent remains anchored to the object even under short glances with minimal 3 gaze samples, achieving a tracking rate of 0.94 for dynamic targets and selection accuracy of 0.98 for static targets. We further introduce a continuous shared control and multi-modal interaction paradigm, enabling high-readiness control and human-in-loop feedback, thereby reducing task duration for nearly 10 \%. Experiments across dynamic tracking, multi-perspective alignment, a baseline comparison, user studies, and ablation studies demonstrate improved robustness, efficiency, and reduced workload compared to representative baselines.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Multi-perspective Alignment | Multi-Perspective Alignment Environment (test) | Tracking Rate100 | 45 | |
| Intent Recognition | Scenario 1 Dynamic | Tracking Rate92 | 6 | |
| Intent Recognition | Scenario Static S1 | Selection Accuracy98 | 6 | |
| Robot Task Execution | Robot Task Scenarios Scenario S3 | Command Duration (s)4.2 | 5 | |
| Robot Task Execution | Robot Task Scenarios Scenario S4 | Command Duration (s)1.4 | 5 | |
| User Study | User Study | NASA-TLX Score25.57 | 5 |