Share your thoughts, 1 month free Claude Pro on usSee more

Episodic-memory Question Answering on OpenEQA v1 (HM3D)

85.1LLM Match Score

Human baseline

Updated 5mo ago

Evaluation Results

Method	Links
Human baseline 2025.12		85.1
R4 2025.12		76.96
GPT-4V 2025.12		46.6
AlanaVLM 2025.12		44.8
GPT-4 w/ LLaVA-1.5 2025.12		40
GPT-4 2025.12		35.5
GPT-4 w/ SVM 2025.12		35
GPT-4 w/ CG 2025.12		34
LLaMA-2 w/ LLaVA-1.5 2025.12		31.1
LLaMA-2 w/ SVM 2025.12		30.9
LLaMA-2 2025.12		29
LLaMA-2 w/ CG 2025.12		24.2