Share your thoughts, 1 month free Claude Pro on usSee more
WorkDL logo mark

Crafter

Benchmarks

Task NameDataset NameSOTA ResultTrend
Explainable AICRAFTER-XAI-Bench (test)
Faithfulness0.54
25
Sequential environment decision makingCrafter BALROG protocol
Peak Task Score (%)37.9
8
Question AnsweringCrafter four-actor rollout robustness
Average Exact Match (EM)59.3
7
Raster-to-SVG conversion80 CRAFTER outputs (test)
Position Score8.1
5
World ModelingCrafter-OO
Rank @ 118.7
5
Reinforcement LearningCrafter 1M steps
Score12.1
5
Reinforcement LearningCrafter
Score16.2
4
World Model PredictionCrafter
Avg Imagination MSE5.07
2
Showing 8 of 8 rows