Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Deep Search on HLE-VL
Loading...
19
Accuracy (HLE-VL)
Gemini-2.5 Pro
3.816
7.758
11.7
15.642
May 11, 2026
Accuracy (HLE-VL)
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy (HLE-VL)
Gemini-2.5 Pro
Evaluation Setting=Dir...
2026.05
19
GPT-5
Evaluation Setting=Age...
2026.05
18.1
Gemini-2.5 Pro
Evaluation Setting=Age...
2026.05
17.3
GPT-5
Evaluation Setting=Dir...
2026.05
15.8
WebWatcher-32B
Evaluation Setting=Dee...
2026.05
13.6
Gemini-2.5 Flash
Evaluation Setting=Dir...
2026.05
12
Qwen3-VL-8B + ODE-RL
Evaluation Setting=Vis...
2026.05
11.4
WebWatcher-7B
Evaluation Setting=Dee...
2026.05
10.6
Qwen3-VL-30B + ODE-RL
Evaluation Setting=Vis...
2026.05
10.5
Qwen3-VL-30B + ODE-SFT
Evaluation Setting=Vis...
2026.05
9.9
Gemini-2.5 Flash
Evaluation Setting=Age...
2026.05
8.5
Qwen3-VL-30B
Evaluation Setting=Vis...
2026.05
8.5
Qwen3-VL-8B + ODE-SFT
Evaluation Setting=Vis...
2026.05
8.2
Qwen3-VL-8B
Evaluation Setting=Dir...
2026.05
6.1
Qwen3-VL-8B
Evaluation Setting=Vis...
2026.05
6.1
Qwen3-VL-30B
Evaluation Setting=Dir...
2026.05
5
Qwen3-VL-8B
Evaluation Setting=Age...
2026.05
5
Qwen3-VL-30B
Evaluation Setting=Age...
2026.05
4.4
Feedback
Search any
task
Search any
task