Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Minecraft

Benchmarks

Task NameDataset NameSOTA ResultTrend
Long-horizon Task ExecutionMinecraft Long-horizon Tasks
Wood100
15
Short-horizon dependency-based functional block utilizationMinecraft
CraftGrid Success@0→1057.5
11
Multi-step dependency reasoningMinecraft
WeaponSet Success@0→1040
11
Functionally equivalent reasoningMinecraft
BridgeEq Success@0→1050
11
Structural and shape-based recipe transferMinecraft
Bed Success Rate (0->10 steps)62.5
11
Embodied Agent Task CompletionMinecraft Armor Group
Success Rate (SR)55.6
8
Embodied Agent Task CompletionMinecraft Redstone Group
Success Rate (SR)49.4
8
Embodied Agent Task CompletionMinecraft Diamond Group
Success Rate (SR)66.1
8
Embodied Agent Task CompletionMinecraft Gold Group
Success Rate (SR)72.3
8
Embodied Agent Task CompletionMinecraft Iron Group
Success Rate (SR)74
8
Embodied Agent Task CompletionMinecraft Stone Group
Success Rate (SR)80
8
Embodied Agent Task CompletionMinecraft Wood Group
Success Rate (SR)95.7
8
Sequential Milestone Success RateMinecraft Obtain Diamond task
Log Success Rate100
8
Video GenerationMinecraft
FVD62.43
8
Video PredictionMinecraft (300 frames)
SSIM0.506
6
Video PredictionMinecraft
SSIM34.9
6
Long-Context Video PredictionMinecraft 128x128 (test)
SSIM0.448
6
Open-Ended Instruction Task ExecutionMinecraft Open-Ended Instruction Tasks (test)
Torch Success Rate75
6
Boss CombatMinecraft Ender Dragon (the End)
Health Ratio67.9
4
Interactive World ModelingMinecraft Interactive Gameplay (0~200 frames)
PSNR14.02
3
Interactive World ModelingMinecraft Interactive Gameplay (0~128 frames)
PSNR14.9
3
Interactive World ModelingMinecraft Interactive Gameplay (0~64 frames)
PSNR16.31
3
Interactive World ModelingMinecraft Interactive Gameplay (0~32 frames)
PSNR17.87
3
Inverse Dynamics ModelingMinecraft
Pearson R (X)80.29
2
Image Classification (Animal Presence Detection)Minecraft (test)
Top-1 Accuracy-
0
Showing 25 of 25 rows