Share your thoughts, 1 month free Claude Pro on usSee more

Large Multimodal Model Evaluation on MLLM-as-a-Judge v1.0 (test)

49Overall Score

GPT-4V

Updated 5mo ago

Evaluation Results

Method	Links
GPT-4V 2024.10		49	63.6	77.3
GPT-4o 2024.10		43.9	57.7	73.6
GPT-4V 2024.10		42.4	53.8	71.7
LLaVA-Critic-72B 2024.10		39.3	57.8	71.5
LLaVA-Critic-7B 2024.10		31.4	55.6	68.9
LLaVA-Critic-7B 2024.10		31.2	54.6	67.5
Gemini-pro 2024.10		30.4	50.9	61.5
LLaVA-OV-72B 2024.10		28.7	51.3	70.1
LLaVA-Critic 2024.10		27.2	54.7	67.7
Qwen2-VL-7B-Instruct 2024.10		25.3	34.8	64.5
LLaMA3.2-V 2024.10		23.7	52.9	65.8
LLaVA-Critic 2024.10		22.8	52.8	65.6
Prometheus-V 2024.10		21.3	-	-
LLaVA-NeXT 2024.10		19.8	46.1	58.6
LLaVA-v1.5-7B 2024.10		15.8	43.9	57.6
LLaVA-OV-7B 2024.10		15.1	42.6	55