PAVE

Benchmarks

Task Name	Dataset Name	SOTA Result
Knowledge Conflict Resolution	PAVE (test)	IE59	45
Segmentation	PAVE	mIoU20.6	18
Text Generation	PAVE	CIDEr41.97	14
Depth Estimation	PAVE	Depth Accuracy48.95	8
LLM Arbitration	PAVE Dimension 2: Temporal Setting v1 (test)	CR (KU)94.81	7
LLM Arbitration	PAVE Dimension 1 Counterfactual Setting v1 (test)	Margin0.661	7
Agent Norm Conversion	PAVE Environment Scenario 3 Jaywalker	CRD110	4
Hallucination Mitigation	PAVE	CHAIRi Score26.78	4

Showing 8 of 8 rows