Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Person Search on Track 1 (Who)
Loading...
100
SR@1
Oracle
29.488
47.794
66.1
84.406
Apr 14, 2026
SR@1
SR@5
Parse Accuracy
Updated 4d ago
Evaluation Results
Method
Method
Links
SR@1
SR@5
Parse Accuracy
Oracle
Backbone=GPT-4o
2026.04
100
100
100
LLM ToolCall
Backbone=GPT-4o
2026.04
81.1
82.3
90.8
LLM Direct
Backbone=GPT-4o
2026.04
73.3
73.3
93
Rule-based
Backbone=GPT-4o
2026.04
32.2
48
66
Feedback
Search any
task
Search any
task