Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding

About

Understanding human--environment interactions from egocentric vision is essential for assistive robotics and embodied intelligent agents, yet existing multimodal large language models (MLLMs) still struggle with accurate interaction reasoning and fine-grained pixel grounding. To this end, this paper introduces EARL, an Egocentric Analysis-guided Reinforcement Learning framework that explicitly transfers coarse interaction semantics to query-oriented answering and grounding. Specifically, EARL adopts a two-stage parsing framework including coarse-grained interpretation and fine-grained response. The first stage holistically interprets egocentric interactions and generates a structured textual description. The second stage produces the textual answer and pixel-level mask in response to the user query. To bridge the two stages, we extract a global interaction descriptor as a semantic prior, which is integrated via a novel Analysis-guided Feature Synthesizer (AFS) for query-oriented reasoning. To optimize heterogeneous outputs, including textual answers, bounding boxes, and grounding masks, we design a multi-faceted reward function and train the response stage with GRPO. Experiments on Ego-IRGBench show that EARL achieves 65.48% cIoU for pixel grounding, outperforming previous RL-based methods by 8.37%, while OOD grounding results on EgoHOS indicate strong transferability to unseen egocentric grounding scenarios.

Yuejiao Su, Xinshen Zhang, Zhen Ye, Lei Yao, Lap-Pui Chau, Yi Wang• 2026

Related benchmarks

TaskDatasetResultRank
Pixel GroundingEgoHOS Out-of-distribution (test)
Left Hand IoU52.3
18
Egocentric Interaction AnsweringEgo-IRGBench (test)
METEOR93.9
15
Egocentric Interaction AnsweringEgo-IRGBench (val)
METEOR0.933
15
Egocentric Interaction GroundingEgo-IRGBench (test)
cIoU65.48
15
Egocentric Interaction GroundingEgo-IRGBench (val)
cIoU62.71
15
Egocentric Interaction AnalysisEgo-IRGBench (test)
METEOR0.542
15
Egocentric Interaction AnalysisEgo-IRGBench (val)
METEOR0.541
15
Showing 7 of 7 rows

Other info

Follow for update