Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ASSE-Safety

Benchmarks

Task NameDataset NameSOTA ResultTrend
Trajectory-level safety evaluationASSE-Safety (test)
Accuracy81.1
20
Agent Safety AuditingASSE-Safety
Accuracy85.4
13
Binary safe/unsafe classificationASSE-Safety (test)
Accuracy67.4
4
Showing 3 of 3 rows