Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Scenario

Benchmarks

Task NameDataset NameSOTA ResultTrend
Perceived Risk PredictionScenario MB
RMSE0.2391
81
Binary ClassificationScenario 3 (val)
Delta TCC253
24
Binary ClassificationScenario 2 (val)
Delta TCC591
24
Binary ClassificationScenario 1 (val)
Delta TCC0
24
RegressionScenario IS2
Size0
24
ClassificationScenario IS1
Model Size0
24
Change point localizationScenario 5
Mismatch Proportion (K!=K)0.055
20
Change point localizationScenario 3
Error Proportion (K_hat != K)70.5
20
Quantile RegressionScenario 3 n=10000
MSE (tau=0.05)0.6307
16
Quantile RegressionScenario 3 n=5000
MSE (τ=0.05)0.8839
16
Quantile RegressionScenario 3 n=1000
MSE (τ=0.05)1.9425
16
Quantile RegressionScenario 2 n=10000
MSE (τ=0.05)0.1008
16
Quantile RegressionScenario 2 (n=5000)
MSE (τ=0.05)0.143
16
Quantile RegressionScenario 2 n=1000
MSE (τ=0.05)0.4515
16
Quantile RegressionScenario 1 n=10000
MSE (τ=0.05)0.0618
16
Quantile RegressionScenario 1 (n=5000)
MSE (Quantile 0.05)0.0996
16
Policy Value EstimationScenario 4
Policy Value Mean6.714
15
Policy Value EstimationScenario 3
Policy Value (mean)1.879
15
Individualized Treatment Rule EstimationScenario 2
Policy Value (PV)1.095
15
Individualized Treatment Rule EstimationScenario 1
Policy Value (PV)1.017
15
Multi-agent trajectory planning10 agent scenario (ground truth goals)
Trajectory Success Rate2.31
12
Change point localizationScenario 1 T=300
Prop. K_hat != K1
10
Change point localizationScenario 1 T=150
Error Proportion0
10
Causal Action ExecutionScenario S4
Success Rate100
9
Single-Step Action ExecutionScenario S2
Success Rate100
9
Showing 25 of 128 rows