Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Security

Benchmarks

Task NameDataset NameSOTA ResultTrend
Task RoutingSecurity
Cost ($)0.01
15
Misaligned Task LearningSecurity In-domain
Misalignment2.1
6
Emergent Misalignment MeasurementSecurity General evaluation
Misalignment Score1.21
6
Explainable AI Performance EvaluationSecurity
Composite Score (Entropy-Weighted, Domain-Modulated)2.98
5
Task-Efficient RoutingSecurity Curated Task Benchmark 1.0 (test)
Avg. Cost0.0021
3
Showing 5 of 5 rows