Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

About

Egocentric interaction perception is one of the essential branches in investigating human-environment interaction, which lays the basis for developing next-generation intelligent systems. However, existing egocentric interaction understanding methods cannot yield coherent textual and pixel-level responses simultaneously according to user queries, which lacks flexibility for varying downstream application requirements. To comprehend egocentric interactions exhaustively, this paper presents a novel task named Egocentric Interaction Reasoning and pixel Grounding (Ego-IRG). Taking an egocentric image with the query as input, Ego-IRG is the first task that aims to resolve the interactions through three crucial steps: analyzing, answering, and pixel grounding, which results in fluent textual and fine-grained pixel-level responses. Another challenge is that existing datasets cannot meet the conditions for the Ego-IRG task. To address this limitation, this paper creates the Ego-IRGBench dataset based on extensive manual efforts, which includes over 20k egocentric images with 1.6 million queries and corresponding multimodal responses about interactions. Moreover, we design a unified ANNEXE model to generate text- and pixel-level outputs utilizing multimodal large language models, which enables a comprehensive interpretation of egocentric interactions. The experiments on the Ego-IRGBench exhibit the effectiveness of our ANNEXE model compared with other works.

Yuejiao Su, Yi Wang, Qiongyang Hu, Chuang Yang, Lap-Pui Chau• 2025

Related benchmarks

TaskDatasetResultRank
Semantic segmentationEgoHOS in-domain (test)
Left Hand IoU91.5
13
Egocentric Hand-Object Segmentationmini-HOI4D out-of-distribution (test)
IoU (Left Hand)68.06
11
Egocentric Hand-Object SegmentationEgoHOS out-of-domain (test)
Left Hand IoU92.45
11
Hand-object segmentationEgoHOS out-of-domain (test)
Left Hand Accuracy0.9703
10
Hand-object segmentationHOI4D mini
Left Hand Accuracy96.54
10
Analyzing sub-taskEgo-IRGBench (val)
METEOR0.563
5
Analyzing sub-taskEgo-IRGBench (test)
METEOR0.563
5
Referring Image SegmentationEgo-IRGBench (val)
cIoU35.14
5
Referring Image SegmentationEgo-IRGBench (test)
cIoU36.02
5
AnsweringEgo-IRGBench 1.0 (val)
METEOR36.3
4
Showing 10 of 11 rows

Other info

Code

Follow for update