Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Simple

Benchmarks

Task NameDataset NameSOTA ResultTrend
Backdoor DetectionSimple IHU Llama 8B
AUROC0.992
15
Backdoor DetectionSimple IHU Gemma 2B
AUROC1
15
Mobile ManipulationSimple (simulation)
Mission Success Rate100
12
Main-effect function estimationsimple High dependence Synthetic
ORMSE0.172
3
Main-effect function estimationsimple Low dependence Synthetic
ORMSE0.062
3
Main-effect function estimationsimple Independent Synthetic
ORMSE0.013
3
Showing 6 of 6 rows