Our new X account is live! Follow @wizwand_team for updates
WorkDL logo mark

S-Eval, ORFuzzSet, and NQ

Benchmarks

Task NameDataset NameSOTA ResultTrend
Safety-Utility Trade-off EvaluationS-Eval, ORFuzzSet, and NQ Aggregated
F1 Score86.81
72
Showing 1 of 1 rows