Share your thoughts, 1 month free Claude Pro on usSee more

Multimodal Reasoning on BabyVision (test)

49.7Accuracy

Gemini-3-Pro

Updated 1d ago

Evaluation Results

Method	Links
Gemini-3-Pro 2026.06		49.7
GPT-5.2 2026.06		34.4
Doubao-1.8 2026.06		30.2
ATLAS-MM 2026.06		23.97
ATLAS 2026.06		23.71
Majority Voting 2026.06		23.2
Budget Forcing 2026.06		21.91
Self-Refine 2026.06		21.39
Self-Refine (no early stop) 2026.06		21.13
Reward-model reranking 2026.06		20.88
Pass@1 2026.06		19.59
Qwen3-VL-Plus 2026.06		19.2
Grok-4 2026.06		16.2