Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Human feedback evaluation consistency on RichHF-18K
Loading...
40.4
Pearson-r
UnifiedReward_Q(8B)
4.468
13.7965
23.125
32.4535
Jun 3, 2025
Pearson-r
Updated 1mo ago
Evaluation Results
Method
Method
Links
Pearson-r
UnifiedReward_Q(8B)
Model=UnifiedReward_Q(8B)
2025.06
40.4
UnifiedReward_L(7B)
Model=UnifiedReward_L(7B)
2025.06
39.9
Gemini-2.5-Pro
Model=Gemini-2.5-Pro
2025.06
39.7
Qwen3-VL(8B)
Model=Qwen3-VL(8B)
2025.06
38.9
Minos(8B)
Model=Minos(8B)
2025.06
36
LLaVA-Critic(72B)
Model=LLaVA-Critic(72B)
2025.06
33
GPT-4o
Model=GPT-4o
2025.06
31.1
LLaVA-OV(72B)
Model=LLaVA-OV(72B)
2025.06
27.2
LLaVA-Critic(7B)
Model=LLaVA-Critic(7B)
2025.06
18.4
Prometheus-V(7B)
Model=Prometheus-V(7B)
2025.06
8.19
LLaVA-OV(7B)
Model=LLaVA-OV(7B)
2025.06
5.85
Feedback
Search any
task
Search any
task