Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Trajectory Verification Agreement on WebVoyager
Loading...
3.4
Unterminated Rate (%)
Native Verifier
3.368
3.584
3.8
4.016
Apr 5, 2026
Unterminated Rate (%)
Success Rate (Native Verifier)
Success Rate (UV Process)
Success Rate (UV Outcome)
FNR (Native vs UV Process)
FPR (Native vs UV Process)
Accuracy (Native vs UV Process)
F1 Score (Native vs UV Process)
Cohen's Kappa (Native vs UV Process)
FNR (Native vs UV Outcome)
FPR (Native vs UV Outcome)
Accuracy (Native vs UV Outcome)
F1 Score (Native vs UV Outcome)
Cohen's Kappa (Native vs UV Outcome)
Updated 9d ago
Evaluation Results
Method
Method
Links
Unterminated Rate (%)
Success Rate (Native Verifier)
Success Rate (UV Process)
Success Rate (UV Outcome)
FNR (Native vs UV Process)
FPR (Native vs UV Process)
Accuracy (Native vs UV Process)
F1 Score (Native vs UV Process)
Cohen's Kappa (Native vs UV Process)
FNR (Native vs UV Outcome)
FPR (Native vs UV Outcome)
Accuracy (Native vs UV Outcome)
F1 Score (Native vs UV Outcome)
Cohen's Kappa (Native vs UV Outcome)
Native Verifier
Agent Model=GPT-5, N (...
2026.04
3.4
90.6
79.4
71
4
68
83
90
0.36
2
72
78
86
0.33
Native Verifier
Agent Model=Fara-7B, N...
2026.04
4.2
74.6
49
37.9
6
56
69
75
0.38
1
60
63
68
0.33
Feedback
Search any
task
Search any
task