Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

lock-in benchmark

Benchmarks

Task NameDataset NameSOTA ResultTrend
Simulation Task Generalizationlock-in benchmark T6 [S]
Success Rate13
2
Simulation Task Generalizationlock-in benchmark T5 [S]
Success Count11
2
Simulation Task Generalizationlock-in benchmark T1
Success Count16
2
Showing 3 of 3 rows