Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BASIL

Benchmarks

Task NameDataset NameSOTA ResultTrend
Sycophancy AssessmentBASIL 1.0 (Under-Update)
Change in Bayesian Error (RMSE)-0.355
32
Sycophancy AssessmentBASIL Over-Update 1.0
Change in Bayesian Error (RMSE)0.016
32
Sycophancy AssessmentBASIL 1.0 (All)
Change in Bayesian Error (RMSE)-0.096
32
Bayesian Assessment of SycophancyBASIL User belief setting 1.0 (test)
Bayesian Error (RMSE)0.156
18
Bayesian Assessment of SycophancyBASIL Third-p. belief setting 1.0 (test)
Bayesian Error (RMSE)0.16
18
Bayesian Assessment of SycophancyBASIL Abstract setting 1.0 (test)
Bayesian Error (RMSE)0.197
18
Showing 6 of 6 rows