Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Risk Assessment

Benchmarks

Task NameDataset NameSOTA ResultTrend
Activation SteeringRisk Assessment
Steering Effect (Delta s)0.49
5
Risk AssessmentRisk Assessment benchmark (500-sample)
Macro F1 (> 1)78.7
5
Risk AssessmentRisk Assessment 500-sample
Data Exfiltration85.8
5
Showing 3 of 3 rows