Share your thoughts, 1 month free Claude Pro on usSee more

Multimodal Evaluation Consistency on MLLM-as-a-Judge, RichHF-18K, GenAI-Bench

44.2Average Score

GPT-4o

Updated 2mo ago

Evaluation Results

Method	Links
GPT-4o 2025.06		44.2
Minos 2025.06		42.3
Minos(8B) 2025.06		42.3
Gemini-2.5-Pro 2025.06		41.5
Gemini-2.5-Pro 2025.06		41.5
LLaVA-Critic 2025.06		39.8
Qwen3-VL 2025.06		38.4
Qwen3-VL(8B) 2025.06		38.4
UnifiedReward_Q 2025.06		37.2
UnifiedReward_Q(8B) 2025.06		37.2
GPT-4o 2025.06		35.9
LLaVA-Critic(72B) 2025.06		35.4
UnifiedReward_L 2025.06		33.6
UnifiedReward_L(7B) 2025.06		33.6
LLaVA-Critic 2025.06		30.7
LLaVA-OV 2025.06		30
LLaVA-Critic(7B) 2025.06		27.8
LLaVA-OV(72B) 2025.06		26.8
LLaVA-OV(7B) 2025.06		21.9
Prometheus-V 2025.06		20.3
Prometheus-V(7B) 2025.06		16.2
LLaVA-OV 2025.06		14.6