Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agent Trajectory Verification Agreement on Mind2Web OM2W
Loading...
5
Unterminated Rate
Native Verifier
4.912
5.506
6.1
6.694
Apr 5, 2026
Unterminated Rate
Success Rate (Native Verifier)
Success Rate (UV Process)
Success Rate (UV Outcome)
FNR (Native vs. UV Process)
FPR (Native vs. UV Process)
Accuracy (Native vs. UV Process)
F1 Score (Native vs. UV Process)
Cohen's Kappa (Native vs. UV Process)
FNR (Native vs. UV Outcome)
FPR (Native vs. UV Outcome)
Accuracy (Native vs. UV Outcome)
F1 Score (Native vs. UV Outcome)
Cohen's Kappa (Native vs. UV Outcome)
Updated 9d ago
Evaluation Results
Method
Method
Links
Unterminated Rate
Success Rate (Native Verifier)
Success Rate (UV Process)
Success Rate (UV Outcome)
FNR (Native vs. UV Process)
FPR (Native vs. UV Process)
Accuracy (Native vs. UV Process)
F1 Score (Native vs. UV Process)
Cohen's Kappa (Native vs. UV Process)
FNR (Native vs. UV Outcome)
FPR (Native vs. UV Outcome)
Accuracy (Native vs. UV Outcome)
F1 Score (Native vs. UV Outcome)
Cohen's Kappa (Native vs. UV Outcome)
Native Verifier
Agent Model=Fara-7B, N...
2026.04
5
32.2
25.8
15.8
26
18
80
66
0.52
17
23
78
55
0.42
Native Verifier
Agent Model=GPT-5, N (...
2026.04
7.2
62
64.9
48.6
27
42
67
74
0.3
24
49
63
67
0.27
Feedback
Search any
task
Search any
task