Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

MaliciousGen & LMSYS-Chat

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety EvaluationMaliciousGen & LMSYS-Chat (test)
Rule Score97.31
8
Robust Safety and Utility Evaluation in Federated LearningMaliciousGen & LMSYS-Chat
Rule Compliance92.5
8
Safety and Utility EvaluationMaliciousGen & LMSYS-Chat
Rule Score97.31
3
Showing 3 of 3 rows