Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Security

Benchmarks

Task NameDataset NameSOTA ResultTrend
Tool-use security and utility evaluationSecurity
Utility Rate98.4
16
Task RoutingSecurity
Cost ($)0.01
15
Misaligned Task LearningSecurity In-domain
Misalignment2.1
6
Emergent Misalignment MeasurementSecurity General evaluation
Misalignment Score1.21
6
Explainable AI Performance EvaluationSecurity
Composite Score (Entropy-Weighted, Domain-Modulated)2.98
5
ForecastingSecurity
MSE107.554
3
Task-Efficient RoutingSecurity Curated Task Benchmark 1.0 (test)
Avg. Cost0.0021
3
Showing 7 of 7 rows