| Task Name | Dataset Name | SOTA Result | Trend | |
|---|---|---|---|---|
| Knowledge Conflict Resolution | PAVE (test) | IE59 | 45 | |
| Segmentation | PAVE | mIoU20.6 | 18 | |
| Text Generation | PAVE | CIDEr41.97 | 14 | |
| Depth Estimation | PAVE | Depth Accuracy48.95 | 8 | |
| LLM Arbitration | PAVE Dimension 2: Temporal Setting v1 (test) | CR (KU)94.81 | 7 | |
| LLM Arbitration | PAVE Dimension 1 Counterfactual Setting v1 (test) | Margin0.661 | 7 | |
| Agent Norm Conversion | PAVE Environment Scenario 3 Jaywalker | CRD110 | 4 | |
| Hallucination Mitigation | PAVE | CHAIRi Score26.78 | 4 |