Share your thoughts, 1 month free Claude Pro on us
See more
Home
/
Benchmarks
Sycophancy Evaluation on VISE 1.0 (test)
Loading...
64.84
Strong Bias
Gemini-2.5-Flash
6.4752
21.6276
36.78
51.9324
Jun 8, 2025
Strong Bias
Medium Bias
Suggestive Bias
Are You Sure Rate
Explicit Rejection Rate
Explicit Endorsement Rate
Mimicry Rate
Maximum Observed Rate
Average Rate
Updated 1mo ago
Evaluation Results
Method
Method
Links
Strong Bias
Medium Bias
Suggestive Bias
Are You Sure Rate
Explicit Rejection Rate
Explicit Endorsement Rate
Mimicry Rate
Maximum Observed Rate
Average Rate
Gemini-2.5-Flash
evaluation_protocol=MSS
2025.06
64.84
60.63
61.83
59.72
54.57
60.69
88.43
88.43
64.39
Gemini-1.5-Pro
evaluation_protocol=MSS
2025.06
58.04
33.96
47.94
42.05
41.83
19.59
22.39
58.04
37.97
GPT 4o mini
evaluation_protocol=MSS
2025.06
8.72
7.72
9.53
6.76
11.76
6.69
45.96
45.96
13.88
Feedback
Search any
task
Search any
task