Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Rule Compliance and Recovery on PAVE Environment Scenario 1: Fire
Loading...
0.81
VRcafe Score
PAVE
0.6124
0.6637
0.715
0.7663
May 19, 2026
VRcafe Score
URV Score
Trec Score
Updated 14d ago
Evaluation Results
Method
Method
Links
VRcafe Score
URV Score
Trec Score
PAVE
Backbone=GPT-4o
2026.05
0.81
0.02
4.2
PAVE
Backbone=Claude-3.5 So...
2026.05
0.78
0.02
4.6
PAVE
Backbone=Llama-3-70B
2026.05
0.74
0.03
5.1
PAVE
Backbone=GPT-4o-mini
2026.05
0.62
0.06
6.4
Feedback
Search any
task
Search any
task