Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

PAVE

Benchmarks

Task NameDataset NameSOTA ResultTrend
Knowledge Conflict ResolutionPAVE (test)
IE59
45
SegmentationPAVE
mIoU20.6
18
Text GenerationPAVE
CIDEr41.97
14
Depth EstimationPAVE
Depth Accuracy48.95
8
LLM ArbitrationPAVE Dimension 2: Temporal Setting v1 (test)
CR (KU)94.81
7
LLM ArbitrationPAVE Dimension 1 Counterfactual Setting v1 (test)
Margin0.661
7
Agent Norm ConversionPAVE Environment Scenario 3 Jaywalker
CRD110
4
Hallucination MitigationPAVE
CHAIRi Score26.78
4
Showing 8 of 8 rows