Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

LEMMA

Benchmarks

Task NameDataset NameSOTA ResultTrend
Robotic PlanningLEMMA Single-Agent Overall
Success Rate (SR)80
14
Robotic PlanningLEMMA Single-Agent Underspecified
Success Rate (SR)99
14
Robotic PlanningLEMMA Single-Agent Absence
Success Rate (SR)96
14
Robotic PlanningLEMMA Single-Agent Multiplicity
Success Rate (SR)77
14
Robotic PlanningLEMMA Stack and Pass tasks, partially observed (test)
Success Rate75
8
Egocentric latent state predictionLEMMA
L2 Error (2s)0.058
7
Keystep recognitionLEMMA
Top-1 Accuracy27.86
4
Temporal GroundingLEMMA D views
Recall@118
4
Robotic Task PlanningLEMMA Single-agent
Calls per Episode10.36
4
Fault detectionLEMMA
AUROC74.1
2
Showing 10 of 10 rows