Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LEMMA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Robotic PlanningLEMMA Single-Agent Overall
Success Rate (SR)80
14
Robotic PlanningLEMMA Single-Agent Underspecified
Success Rate (SR)99
14
Robotic PlanningLEMMA Single-Agent Absence
Success Rate (SR)96
14
Robotic PlanningLEMMA Single-Agent Multiplicity
Success Rate (SR)77
14
Robotic PlanningLEMMA Stack and Pass tasks, partially observed (test)
Success Rate75
8
Robotic Task PlanningLEMMA Single-agent
Calls per Episode10.36
4
Fault detectionLEMMA
AUROC74.1
2
Showing 7 of 7 rows