LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos

About

In this paper we introduce LifelongMemory, a new framework for accessing long-form egocentric videographic memory through natural language question answering and retrieval. LifelongMemory generates concise video activity descriptions of the camera wearer and leverages the zero-shot capabilities of pretrained large language models to perform reasoning over long-form video context. Furthermore, LifelongMemory uses a confidence and explanation module to produce confident, high-quality, and interpretable answers. Our approach achieves state-of-the-art performance on the EgoSchema benchmark for question answering and is highly competitive on the natural language query (NLQ) challenge of Ego4D. Code is available at https://github.com/agentic-learning-ai-lab/lifelong-memory.

Ying Wang, Yanlai Yang, Mengye Ren• 2023

Related benchmarks

Task	Dataset	Result
Video Question Answering	EgoSchema (Full)	Accuracy64.7	241
Video Question Answering	EgoSchema subset	Accuracy72	124
Video Question Answering	EgoSchema 500-question subset	Accuracy68	50
Video Question Answering	EgoSchema 5031 videos (test)	Top-1 Accuracy62.4	26
Egocentric Video Question Answering	EgoSchema (public leaderboard)	Accuracy68	13

Showing 5 of 5 rows

Other info

Follow for update

@wizwand_team Discord