Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Vision Language Model Evaluation on Sycophancy Benchmark
Loading...
88.8
Mean Score
LFM2-VL
54.688
63.544
72.4
81.256
Apr 27, 2026
Mean Score
Score Std Dev
Bluffing Coefficient
Evidence Recall
Sycophancy Rate
Honest Critic Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Mean Score
Score Std Dev
Bluffing Coefficient
Evidence Recall
Sycophancy Rate
Honest Critic Rate
LFM2-VL
Params=450M
2026.04
88.8
7.1
43
45.9
22.28
0.09
Gemma-3
Params=4B
2026.04
86
11.5
41.4
45.1
11.09
1.06
Qwen2-VL
Params=7B
2026.04
82.8
5.8
29.8
53.7
10.47
0.01
LLaVA-1.6
Params=7B
2026.04
73.7
19.1
21.2
54.9
5.98
6.41
Phi-3.5-Vision
Params=4.2B
2026.04
61.9
35.2
26.5
40.1
12.01
18.7
MiniCPM-V-4.5
Params=8B
2026.04
56
22.3
26.8
35.3
8.45
22.62
Feedback
Search any
task
Search any
task