Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

About

In this paper we introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the zero-shot capabilities of pretrained large language models to perform reasoning over long-form video context. Furthermore, LifelongMemory uses a confidence and explanation module to produce confident, high-quality, and interpretable answers. Our approach achieves state-of-the-art performance on the EgoSchema benchmark for question answering and is highly competitive on the natural language query (NLQ) challenge of Ego4D. Code is available at https://github.com/agentic-learning-ai-lab/lifelong-memory.

Ying Wang, Yanlai Yang, Mengye Ren• 2023

Related benchmarks

TaskDatasetResultRank
Video Question AnsweringEgoSchema (Full)
Accuracy64.7
193
Video Question AnsweringEgoSchema subset
Accuracy72
73
Video Question AnsweringEgoSchema 500-question subset
Accuracy68
50
Video Question AnsweringEgoSchema 5031 videos (test)
Top-1 Accuracy62.4
26
Egocentric Video Question AnsweringEgoSchema (public leaderboard)
Accuracy68
13
Showing 5 of 5 rows

Other info

Follow for update