Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Agentic Search on DeepSearchQA
Loading...
90
Accuracy
FINEVERIFY
69.72
74.985
80.25
85.515
May 30, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
FINEVERIFY
Model=Gemini-3-flash,...
2026.05
90
Best-of-N
Model=Gemini-3-flash,...
2026.05
89.5
Confidence Verify
Model=Gemini-3-flash,...
2026.05
89.5
Solution Aggregation
Model=Gemini-3-flash,...
2026.05
89
Weighted Voting
Model=Gemini-3-flash,...
2026.05
88.5
Majority Voting
Model=Gemini-3-flash,...
2026.05
85.5
Pass@1
Model=Gemini-3-flash,...
2026.05
84
Weighted Voting
Model=GPT-5-mini, Samp...
2026.05
77
FINEVERIFY
Model=GPT-5-mini, Samp...
2026.05
77
Best-of-N
Model=GPT-5-mini, Samp...
2026.05
76
Solution Aggregation
Model=GPT-5-mini, Samp...
2026.05
76
Confidence Verify
Model=GPT-5-mini, Samp...
2026.05
74.5
Majority Voting
Model=GPT-5-mini, Samp...
2026.05
72
Pass@1
Model=GPT-5-mini, Samp...
2026.05
70.5
Feedback
Search any
task
Search any
task