Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

BabyAI

Benchmarks

Task NameDataset NameSOTA ResultTrend
Multi-turn embodied reasoningBabyAI
Success Rate73
37
LLM Agent NavigationBabyAI (test)
Success Rate93.3
25
One-step next-observation predictionBabyAI (test)
Token F193
16
Instruction FollowingBabyAI
Success Rate72.56
14
Instruction FollowingBabyAI BossLevel
Success Rate96.2
14
Imitation LearningBabyAI BossLevel (test)
Success Rate72
9
Imitation LearningBabyAI SynthSeq (test)
Success Rate0.642
9
Imitation LearningBabyAI GoToSeq (test)
Success Rate77.2
9
Instruction FollowingBabyAI Synthseq
Average Episodic Reward0.361
7
Instruction FollowingBabyAI Pickup
Average Episodic Reward0.486
7
Instruction FollowingBabyAI Goto
Average Episodic Reward0.575
7
BosslevelBabyAI
Average Pass Rate0.343
7
SynthseqBabyAI
Average Pass Rate32.1
7
PickupBabyAI
Average Pass Rate33.4
7
GotoBabyAI
Average Pass Rate0.606
7
Representational AlignmentBabyAI instruction set
P@1046.65
7
Hierarchical PlanningBabyAI Combined Skills 3
Token Cost2,454
6
Hierarchical PlanningBabyAI Combined Skills 2
Token Cost2,528
6
Hierarchical PlanningBabyAI Combined Skills 1
Token Cost1,961
6
Hierarchical PlanningBabyAI Unlock
Token Cost5,705
6
Hierarchical PlanningBabyAI Pickup
Token Cost2,405
6
Language-Conditioned TasksBabyAI GoToRedBall
Mean Episodic Return0.92
5
NavigationBabyAI
Success Rate93.2
2
Showing 23 of 23 rows