| Dataset Name | SOTA Method | Metric | Trend | ||
|---|---|---|---|---|---|
| OpenEQA v1.0 (test) | CoV | LLM-Match67.7 | 32 | 3mo ago | |
| A-EQA | Human Agent | Overall (LLM-Match)85.1 | 25 | 2mo ago | |
| OpenEQA | MiMo-Embodied-7B | Score74.1 | 21 | 29d ago | |
| Episodic Memory EQA (EM-EQA) | LLM Match86.8 | 19 | 1mo ago | ||
| OpenEQA EM-EQA | Accuracy86.8 | 19 | 3mo ago | ||
| BridgeEQA 1,100 QA pairs (test) | EMVR VLM w/ Images + SG | Answer Correctness64.8 | 15 | 1mo ago | |
| BridgeEQA (test) | EMVR VLM w/ Images + SG | Image Citation Relevance88.9 | 15 | 1mo ago | |
| RoboVQA | Tarsier2 | BLEU-177.1 | 13 | 3mo ago | |
| A-EQA 1.0 (test) | MetaNav | LLM-Match58.3 | 10 | 2mo ago | |
| OpenEQA EM-EQA Episodes up to 32 frames | LLM-Match Score86.8 | 10 | 2mo ago | ||
| OpenEQA (test) | GPT-4 Score86.8 | 10 | 2mo ago | ||
| EgoTaskQA (test) | Exact Match80 | 10 | 3mo ago | ||
| OpenEQA EM-EQA | Human | LLM-Match86.8 | 8 | 3mo ago | |
| A-EQA 184 (subset) | 3D-Mem | LLM-Match52.6 | 7 | 3mo ago | |
| NaVQA | Graph Memory | Response Latency (s)9.97 | 6 | 1mo ago | |
| 10-drone scenario simulation | MCPA | EQA Accuracy95.35 | 6 | 1mo ago | |
| A-EQA 184-question subset | HGR | LLM-Match55.9 | 6 | 1mo ago | |
| HM-EQA | FAST-EQA | SR69.2 | 6 | 3mo ago | |
| NiEH EQA (test) | LGT | Accuracy56.34 | 6 | 3mo ago | |
| DynHiL-EQA (Static) | DIVRR | Accuracy58.2 | 5 | 2mo ago | |
| DynHiL-EQA Dynamic | DIVRR | Accuracy55.1 | 5 | 2mo ago | |
| DynHiL-EQA (all) | DIVRR | Accuracy56.6 | 5 | 2mo ago | |
| HM-EQA (static) | DIVRR | Accuracy63.8 | 5 | 2mo ago | |
| OpenEQA v1 (test) | Score85.1 | 5 | 3mo ago | ||
| MP3D-EQA | Uni-LaViRA | Accuracy54.7 | 4 | 6d ago |