Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Trajectory Verification Agreement on WebTailBench
Loading...
7.7
Unterminated Rate
Native Verifier
7.328
9.839
12.35
14.861
Apr 5, 2026
Unterminated Rate
Success Rate (Native)
Success Rate (UV Process)
Success Rate (UV Outcome)
FNR (Native vs UV Process)
FPR (Native vs UV Process)
Accuracy (Native vs UV Process)
F1 Score (Native vs UV Process)
Cohen's Kappa (Native vs UV Process)
FNR (Native vs UV Outcome)
FPR (Native vs UV Outcome)
Accuracy (Native vs UV Outcome)
F1 Score (Native vs UV Outcome)
Cohen's Kappa (Native vs UV Outcome)
Updated 9d ago
Evaluation Results
Method
Method
Links
Unterminated Rate
Success Rate (Native)
Success Rate (UV Process)
Success Rate (UV Outcome)
FNR (Native vs UV Process)
FPR (Native vs UV Process)
Accuracy (Native vs UV Process)
F1 Score (Native vs UV Process)
Cohen's Kappa (Native vs UV Process)
FNR (Native vs UV Outcome)
FPR (Native vs UV Outcome)
Accuracy (Native vs UV Outcome)
F1 Score (Native vs UV Outcome)
Cohen's Kappa (Native vs UV Outcome)
Native Verifier
Agent Model=GPT-5, N (...
2026.04
7.7
62.5
63.5
39.9
23
37
72
78
40
17
49
64
65
31
Native Verifier
Agent Model=Fara-7B, N...
2026.04
17
39.6
39.6
23.2
30
20
76
70
50
14
25
77
64
49
Feedback
Search any
task
Search any
task