Share your thoughts, 1 month free Claude Pro on usSee more

Human Preference Agreement on MM-RewardBench2 Edit

79.2Accuracy

Gemini 3.1 Pro + ARR

Updated 2mo ago

Evaluation Results

Method	Links
Gemini 3.1 Pro + ARR 2026.05		79.2
GPT-5 + ARR 2026.05		77.5
Gemini 3.1 Pro 2026.05		77.4
GPT-5 2026.05		73.8
EditReward 2026.05		67.2
Qwen3vl-8B + ARR 2026.05		65.5
Qwen3-VL-8B 2026.05		59.2