Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scenario A

Benchmarks

Task NameDataset NameSOTA ResultTrend
Guardrail Performance EvaluationScenario A Multi-Step Financial Transfer
Accuracy100
15
Selective Risk ControlScenario A (linear, heteroskedastic)
FCR3.4
8
Simultaneous Exploration and InspectionScenario A
Finish Rate Avg97.6
7
Distributed Training and Task Allocation OptimizationScenario A 25 Vehicles
Decision Time (s)0.435
5
Local NavigationScenario A With External Disturbance 100 independent trials (Simulation)
Success Rate92
4
Local NavigationScenario A No External Disturbance Simulation 100 independent trials
Success Rate100
4
Compliance AssessmentScenario A Low Impact 1.0 (test)
Compliance Score0.99
3
Ad RecommendationScenario A Online A/B Test
Revenue Lift8.356
1
Showing 8 of 8 rows