Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Domain Deep Research Tool Use on FinSearchComp Global-T2
Loading...
70.6
Success Rate
Gemini-3-Pro
26.92
38.26
49.6
60.94
Mar 10, 2026
Success Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Success Rate
Gemini-3-Pro
Category=Frontier (≫8B...
2026.03
70.6
DIVE-8B (RL)
Category=Ours, Tempera...
2026.03
67.3
DIVE-8B (SFT)
Category=Ours, Tempera...
2026.03
62.1
DeepSeek-V3.2-Exp
Category=Frontier (≫8B...
2026.03
61.3
GPT-OSS-120B
Category=Frontier (≫8B...
2026.03
61
Claude-4-Sonnet
Category=Frontier (≫8B...
2026.03
60.2
Kimi-K2-0905
Category=Frontier (≫8B...
2026.03
47.1
Gemini-2.5-Pro
Category=Frontier (≫8B...
2026.03
44.5
EnvScaler-8B
Category=8B Baselines,...
2026.03
40.7
WebExplorer-8B
Category=8B Baselines,...
2026.03
35.9
SWE-Dev-8B
Category=8B Baselines,...
2026.03
30.5
Qwen3-8B (base)
Category=Ours, Tempera...
2026.03
28.6
Feedback
Search any
task
Search any
task