Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Minecraft

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-horizon Task ExecutionMinecraft Long-horizon Tasks
Wood100
15
Short-horizon dependency-based functional block utilizationMinecraft
CraftGrid Success@0→1057.5
11
Multi-step dependency reasoningMinecraft
WeaponSet Success@0→1040
11
Functionally equivalent reasoningMinecraft
BridgeEq Success@0→1050
11
Structural and shape-based recipe transferMinecraft
Bed Success Rate (0->10 steps)62.5
11
Long-horizon task completionMinecraft long-horizon task suite (held-out)
Success Rate69.7
9
GUI TasksMinecraft MCU
ASR68.4
9
Embodied Agent Task CompletionMinecraft Armor Group
Success Rate (SR)55.6
8
Embodied Agent Task CompletionMinecraft Redstone Group
Success Rate (SR)49.4
8
Embodied Agent Task CompletionMinecraft Diamond Group
Success Rate (SR)66.1
8
Embodied Agent Task CompletionMinecraft Gold Group
Success Rate (SR)72.3
8
Embodied Agent Task CompletionMinecraft Iron Group
Success Rate (SR)74
8
Embodied Agent Task CompletionMinecraft Stone Group
Success Rate (SR)80
8
Embodied Agent Task CompletionMinecraft Wood Group
Success Rate (SR)95.7
8
Sequential Milestone Success RateMinecraft Obtain Diamond task
Log Success Rate100
8
Video GenerationMinecraft
FVD62.43
8
Long-horizon tasksMinecraft Overall
Success Rate30.29
7
Long-horizon tasksMinecraft Diamond
Success Rate (SR)17.36
7
Long-horizon tasksMinecraft Gold
Success Rate (SR)21.69
7
Long-horizon tasksMinecraft Iron
Success Rate (SR)51.82
7
Long-horizon tasksMinecraft Stone
Success Rate (SR)94.53
7
Long-horizon tasksMinecraft Wood
Success Rate (SR)97.47
7
Closed-loop executionMinecraft (test)
Single Performance Score61.1
6
Action-fixed detectionMinecraft (test)
F1 Score90.5
6
Pairwise Human Preference StudyMinecraft real chunks (val)
Win Rate67.1
6
Showing 25 of 42 rows