Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

ASSEBench

Benchmarks

Task NameDataset NameSOTA ResultTrend
Agent SafetyASSEBench
Accuracy92.04
69
Agent Safety ReasoningASSEBench-Corrected
Accuracy84.72
25
Safety classificationASSEBench
Accuracy92.04
20
Safety classificationASSEBench (test)
Accuracy89.97
12
Showing 4 of 4 rows