Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Tool Use on BrowseComp Domain-specific (9) Search
Loading...
22.5
Accuracy
Gold Oracle
2.012
7.331
12.65
17.969
Feb 16, 2026
Accuracy
Task Completion Count
Updated 1mo ago
Evaluation Results
Method
Method
Links
Accuracy
Task Completion Count
Gold Oracle
Model=GPT-5, Strategy=...
2026.02
22.5
7
TOOLOBSERVER
Model=GPT-5, Strategy=...
2026.02
21.9
6.3
EasyTool
Model=GPT-5, Strategy=...
2026.02
18.9
8.2
Base ReAct
Model=GPT-5, Strategy=...
2026.02
18.1
10
Play2Prompt
Model=GPT-5, Strategy=...
2026.02
17.9
7.8
Gold Oracle
Model=GPT-5-mini, Stra...
2026.02
9.7
2.8
EasyTool
Model=GPT-5-mini, Stra...
2026.02
7.1
3.8
TOOLOBSERVER
Model=GPT-5-mini, Stra...
2026.02
3.2
3.5
Base ReAct
Model=GPT-5-mini, Stra...
2026.02
3.1
4.1
Play2Prompt
Model=GPT-5-mini, Stra...
2026.02
2.8
3.5
Feedback
Search any
task
Search any
task