Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
General Reasoning on AI2D
Loading...
84.1
Accuracy
Qwen3-VL-8B
59.3688
65.7894
72.21
78.6306
May 12, 2026
May 13, 2026
May 14, 2026
May 15, 2026
May 16, 2026
May 17, 2026
May 19, 2026
Accuracy
Updated 14d ago
Evaluation Results
Method
Method
Links
Accuracy
Qwen3-VL-8B
Mode=base
2026.05
84.1
S^3-FT
Backbone=Qwen3-VL-8B,...
2026.05
83.6
S^3-FT
Backbone=Qwen3-VL-8B,...
2026.05
82.6
Qwen2.5-VL-7B
Mode=base
2026.05
80.9
S^3-FT
Backbone=Qwen2.5-VL-7B...
2026.05
80.9
S^3-FT
Backbone=Qwen2.5-VL-7B...
2026.05
80.6
Nash
Strategy=Step-wise ver...
2026.05
78.95
LLaVA Critic
Strategy=Step-wise ver...
2026.05
76.61
Base
Strategy=Step-wise ver...
2026.05
76.52
Sherlock
Strategy=Step-wise ver...
2026.05
61.54
VisionSR1
Strategy=Step-wise ver...
2026.05
60.32
Feedback
Search any
task
Search any
task