Share your thoughts, 1 month free Claude Pro on usSee more

Human feedback evaluation consistency on RichHF-18K

40.4Pearson-r

UnifiedReward_Q(8B)

Updated 2mo ago

Evaluation Results

Method	Links
UnifiedReward_Q(8B) 2025.06		40.4
UnifiedReward_L(7B) 2025.06		39.9
Gemini-2.5-Pro 2025.06		39.7
Qwen3-VL(8B) 2025.06		38.9
Minos(8B) 2025.06		36
LLaVA-Critic(72B) 2025.06		33
GPT-4o 2025.06		31.1
LLaVA-OV(72B) 2025.06		27.2
LLaVA-Critic(7B) 2025.06		18.4
Prometheus-V(7B) 2025.06		8.19
LLaVA-OV(7B) 2025.06		5.85