| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Question Answering | MetaQA 3-hop | Hits@1100 | 47 | |
| Question Answering | MetaQA 3-hop | Accuracy14.7 | 30 | |
| Question Answering | MetaQA 2-hop | Accuracy63.17 | 30 | |
| Question Answering | MetaQA 1-hop | Accuracy82.77 | 30 | |
| Question Answering | MetaQA | 1-hop EM@192.7 | 30 | |
| Knowledge Base Question Answering | MetaQA 1hop | Hits@1100 | 28 | |
| Question Answering | MetaQA 2-hop | Hits@1100 | 28 | |
| Knowledge Graph Question Answering | MetaQA 2-hop (test) | Hits@1100 | 24 | |
| Knowledge Graph Question Answering | MetaQA 3-hop | Accuracy89.6 | 16 | |
| Knowledge Graph Question Answering | MetaQA 1-hop | Accuracy87.6 | 16 | |
| Multi-hop Knowledge Graph Question Answering | MetaQA | Hit@1 (2-hop)100 | 11 | |
| Question Answering | MetaQA 3-hop | Normalized Latency (vs Vanilla RAG)0.86 | 9 | |
| Question Answering | MetaQA 2-hop | Normalized Average Latency0.83 | 9 | |
| Question Answering | MetaQA 1-hop | Average Latency (Normalized)0.8 | 9 | |
| Question Answering | MetaQA 3-hop | Average Edge Budget2.1 | 9 | |
| Question Answering | MetaQA 2-hop | Average Edge Budget0.78 | 9 | |
| Question Answering | MetaQA 1-hop | Hits@197.5 | 9 | |
| Knowledge Graph Question Answering | MetaQA 2-hop 1.0 (test) | Accuracy94.4 | 9 | |
| Knowledge Graph Question Answering | MetaQA 1-hop 1.0 (test) | Accuracy (%)97.5 | 9 | |
| Knowledge Graph Question Answering | MetaQA 50% KG setting (test) | Hits@1 (1-hop)76 | 9 | |
| Hallucination Detection | MetaQA 1hop (Qwen2.5-7B) | AUC85.28 | 7 | |
| Hallucination Detection | MetaQA 1hop (LLaMA2-7B) | AUC83.41 | 7 | |
| Question Answering | MetaQA 1-hop wikimovies | Hits@1 (KB)97.5 | 6 | |
| Question Answering | MetaQA | HS95.5 | 4 | |
| Stealthiness evaluation | MetaQA | Detected Samples8,321 | 3 |