Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

630-scenario real-world benchmark

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety Guardrail Classification630-scenario real-world benchmark (independent set)
Verdict Accuracy95.4
5
Showing 1 of 1 rows