Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Visual Document Reasoning on VDR
Loading...
27.2
Accuracy
Claude-3.7-Sonnet
0.992
7.796
14.6
21.404
May 11, 2026
Accuracy
Updated 22d ago
Evaluation Results
Method
Method
Links
Accuracy
Claude-3.7-Sonnet
Evaluation Setting=Age...
2026.05
27.2
Qwen3-VL-30B + ODE-RL
Evaluation Setting=Vis...
2026.05
26.4
Qwen3-VL-30B + ODE-SFT
Evaluation Setting=Vis...
2026.05
24
Qwen3-VL-8B + ODE-RL
Evaluation Setting=Vis...
2026.05
20.4
Qwen3-VL-8B + ODE-SFT
Evaluation Setting=Vis...
2026.05
19.2
GPT-5
Evaluation Setting=Age...
2026.05
17.6
Claude-4-Sonnet
Evaluation Setting=Age...
2026.05
13.6
Qwen3-VL-30B
Evaluation Setting=Vis...
2026.05
11
GPT-5
Evaluation Setting=Dir...
2026.05
10.8
Gemini-2.5 Pro
Evaluation Setting=Age...
2026.05
10
Gemini-2.5 Pro
Evaluation Setting=Dir...
2026.05
8
Gemini-2.5 Flash
Evaluation Setting=Age...
2026.05
7.8
Gemini-2.5 Flash
Evaluation Setting=Dir...
2026.05
6.2
Qwen3-VL-8B
Evaluation Setting=Age...
2026.05
5
Claude-3.7-Sonnet
Evaluation Setting=Dir...
2026.05
4.6
Qwen3-VL-30B
Evaluation Setting=Age...
2026.05
4.4
Qwen3-VL-8B
Evaluation Setting=Vis...
2026.05
4.2
Qwen3-VL-30B
Evaluation Setting=Dir...
2026.05
3.8
Qwen3-VL-8B
Evaluation Setting=Dir...
2026.05
2.8
Claude-4-Sonnet
Evaluation Setting=Dir...
2026.05
2
Feedback
Search any
task
Search any
task