Our new X account is live! Follow @wizwand_team for updates
Home
/
Benchmarks
Linguistic correctness of descriptions on XM3600
Loading...
85.81
Preference Rate
LLaVA-PLLuM-12B-nc
32.718
46.5015
60.285
74.0685
Feb 15, 2026
Preference Rate
Updated 3d ago
Evaluation Results
Method
Method
Links
Preference Rate
LLaVA-PLLuM-12B-nc
Judge=Llama-3.3-70B-In...
2026.02
85.81
LLaVA-PLLuM-12B-nc-250715
Judge=Llama-3.3-70B-In...
2026.02
84.91
LLaVA-Bielik-11B-v2.6
Judge=Llama-3.3-70B-In...
2026.02
82.35
LLaVA-PLLuM-12B-nc
Judge=Llama-3.3-70B-In...
2026.02
77.53
LLaVA-PLLuM-12B-nc-250715
Judge=Llama-3.3-70B-In...
2026.02
77.47
LLaVA-Bielik-11B-v2.6
Judge=Llama-3.3-70B-In...
2026.02
74.1
LLaVA-PLLuM-12B-nc
Judge=Llama-3.3-70B-In...
2026.02
66.71
LLaVA-PLLuM-12B-nc-250715
Judge=Llama-3.3-70B-In...
2026.02
63.64
LLaVA-Bielik-11B-v2.6
Judge=Llama-3.3-70B-In...
2026.02
60.32
LLaVA-PLLuM-12B-nc
Judge=Llama-3.3-70B-In...
2026.02
48.33
LLaVA-PLLuM-12B-nc-250715
Judge=Llama-3.3-70B-In...
2026.02
43.38
LLaVA-PLLuM-12B-nc
Judge=Llama-3.3-70B-In...
2026.02
43.15
LLaVA-PLLuM-12B-nc-250715
Judge=Llama-3.3-70B-In...
2026.02
42.69
LLaVA-Bielik-11B-v2.6
Judge=Llama-3.3-70B-In...
2026.02
40.31
LLaVA-Bielik-11B-v2.6
Judge=Llama-3.3-70B-In...
2026.02
34.76
Feedback
Search any
task
Search any
task