Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Multimodal Reasoning on BabyVision (test)
Loading...
49.7
Accuracy
Gemini-3-Pro
14.86
23.905
32.95
41.995
Jun 1, 2026
Accuracy
Updated 1d ago
Evaluation Results
Method
Method
Links
Accuracy
Gemini-3-Pro
Protocol=Zero-shot
2026.06
49.7
GPT-5.2
Protocol=Zero-shot
2026.06
34.4
Doubao-1.8
Protocol=Zero-shot
2026.06
30.2
ATLAS-MM
Backbone=Claude Sonnet...
2026.06
23.97
ATLAS
Backbone=Claude Sonnet...
2026.06
23.71
Majority Voting
Backbone=Claude Sonnet...
2026.06
23.2
Budget Forcing
Backbone=Claude Sonnet...
2026.06
21.91
Self-Refine
Backbone=Claude Sonnet...
2026.06
21.39
Self-Refine (no early stop)
Backbone=Claude Sonnet...
2026.06
21.13
Reward-model reranking
Backbone=Claude Sonnet...
2026.06
20.88
Pass@1
Backbone=Claude Sonnet...
2026.06
19.59
Qwen3-VL-Plus
Protocol=Zero-shot
2026.06
19.2
Grok-4
Protocol=Zero-shot
2026.06
16.2
Feedback
Search any
task
Search any
task