Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BeaverTails & LMSYS-Chat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationBeaverTails & LMSYS-Chat (test)
Rule Score97.88
8
Robust Safety and Utility Evaluation in Federated LearningBeaverTails & LMSYS-Chat
Rule Score91.92
8
Safety and Utility EvaluationBeaverTails & LMSYS-Chat
Rule Score97.88
3
Showing 3 of 3 rows